10-06-2026, 07:41 AM
(09-06-2026, 11:26 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Thank you for this analysis. The discrepancy between your distributional distances and my substitution rules is informative rather than contradictory — because I describe context-dependent rules:
"These rules for similar glyphs only apply with some restrictions. For instance 'o' and 'y' can replace each other only as the first or as the last sign. Another example is that 'o' is interchangeable with 'a' before 'l' and 'r' but not after 'q' or before 'k'." (You are not allowed to view links. Register or Login to view., p. 5). A word grid documenting the most frequent substitution relationships across the VMS vocabulary is available at You are not allowed to view links. Register or Login to view. (see also Timm 2014, pp. 66-82).
So "o" and "a" substitute in specific positions — as prefix elements before "l" and "r" — but not in all contexts. Their global distributional distance (0.55 in your measurement) is high because "o" after "q" has no "a" equivalent, pulling the global distances apart. But in the specific positions where they do substitute — "ol"/"al", "or"/"ar", "chol"/"chal" — they are interchangeable.
The same applies to "n" and "r" (0.63 in your measurement). Both appear word-finally, and in that position they substitute — dain/dair, sain/sair, okain/okair etc. But "r" also appears in other positions where "n" doesn't, making their global distributions different.
Your core pairs — ch/sh, k/t, p/f, r/l — show low distances because they substitute freely across many contexts. The pairs with higher distances in your analysis — o/a, o/y, n/r — substitute only in restricted positions, which dilutes their global similarity.
Note: Currier already noted in 1976 that the Voynich glyphs are constructed from shared base strokes — 'you can make up almost any of the other letters out of these two symbols i and e' (see The Nature of the Symbols in You are not allowed to view links. Register or Login to view.). Your distributional distances quantify this observation: glyph pairs with low distance are the pairs that share stroke structure.
Thank you for your explanations and remarks. I broadly agree with you, maybe with just some different nuance. Ie. I agree there seem to be two different kinds of 'y', a common one at the end of many words and a rarer one at the beginning of some, so it's quite possible that the 'similarity rules' are different in the two cases. Instead in the case of 'n' vs. 'r/m' I suspect the difference is not driven by words such as 'raiin', with a non-final 'r', but by common words such as 'ar', 'dar', where 'r' is final but preceded by 'a', while most of the final 'n' are preceded by 'i' ('an' and 'dan' being instead quite rare). These behaviours might be interesting to explore (I'll see if I can when I get the time).
About the 'shared base strokes': it's true of course that VMS symbols are constructed with just a few of them, but I've never been conviced of the significance of this fact because I find it rather expected that any writing system will tend to re-use the same basic strokes. Ie. in the normal block letters Latin alphabet 'o', 'p', 'q', 'b', 'd', 'l' are all made with a circle (or none)+ a vertical bar (or none), but this does not mean they are related to each other.