Options

Similarity of Voynichese glyphs according to their immediate statistical environment

Index
Similarity of Voynichese glyphs according to their immediate statistical environment
Similarity of Voynichese glyphs according to their immediate statistical environment

Mauro > 4 hours ago

I posted You are not allowed to view links. Register or Login to view. about the 'similarity' of Voynich glyphs with each other and I'd like to elaborate further on that. Not sure if it's useful and/or a new thing (I doubt), but anyway..

For each glyph I calculate the frequency distributions of the preceding glyph: this gives a vector of numbers (adding up to 1) for each glyph. Then I calculate the Euclidean distance between each couple of vectors (root-mean-square: the square root of the sum of the squares of the differences). I do the same for the distributions of the following glyph. By construction, each distance can be as a minimum zero, and as a maximum SQRT(2) =~ 1.414

What does this mean in practice? If we find that EVA 'k' and EVA 't' have small distances, both of them, and, say, we find the sequence 'oka' in the text, then it's probable we'll also find 'ota', and moreover the ratio between 'oka's and 'ota's will probably be similar to the ratio between 'k's and 't's.

These are the most similar glyph couplets, as measured by the average of distance_previous and distance_following, considering the whole RF1a-n transcription and excluding rare glyphs (defined as all the glyphs with a frequency lower or equal to that of EVA 'g'):

If you want to consider also the rare glyphs, add the following couplets:

For reference (excluding rare characters), the two most dissimilar glyphs are, unsurprisingly, 'q' and 'n'. Average distance = 1.39, previous = 1.39, following = 1.38. Almost maximally orthogonal.

Notice: the above analysis considers 'ch', 'sh', 'ckh', 'cth', 'cph' and 'cfh' to be stand-alone glyphs. This is arbitrary of course (but I think there are good reasons for it). Also, there was some manual work involved in creating the results tables, so excuse for any errors or omissions.

When I can, I'll try to get the same data for each section of the VMS.
RE: Similarity of Voynichese glyphs according to their immediate statistical environment

Jorge_Stolfi > 2 hours ago

[quote="Mauro" pid='85467' dateline='1781012206']

I have long suspected that ee is a single glyph in the same class as Ch and Sh (or maybe just an error for Ch); while an e alone is a modifier for the previous k, t, Ch, Sh, CKh, or CTh; and eee is an ee modified by e.

I also believe that the I in Ih,IKh, ITh is an error, and should be C; that CTHh and CKHh should be CThe and CKhe; that ir should be iin; that m is an abbreviation for iin; and that b, u, g are just badly shaped versions of other glyphs.

And finally I suspect that p and f are fancy forms of te and ke, respectively.

Would your analysis be compatible with some or all of these hunches?

All the best, --stolfi
RE: Similarity of Voynichese glyphs according to their immediate statistical environment

Mauro > 2 hours ago

Some more data, and an answer to Stolfi.

I re-did the same distance analysis separately on the Balneological and the Herbal A sections. Beyond considering 'ch', 'sh', ckh', 'cth', 'cph' and 'cfh' as stand-alone glyphs this time I also collapsed every multiple occurence of 'i' and 'e' to a single 'i' and 'e' (I don't think this changed anything in the results).

These are all the most similar couplets, in green those with a distance < sqrt(2)/8 (the 12.5% percentile), in yellow with a distance  < sqrt(2)/4 (the 25% percentile), only the average distances, sorry:



Now answering Stolfi:

Quote:I have long suspected that  ee is a single glyph in the same class as Ch and Sh (or maybe just an error for Ch); while an e alone is a modifier for the previous k, t, Ch, Sh, CKh, or CTh; and eee is an ee modified by e.

I also believe that the I in Ih,IKh, ITh is an error, and should be C; that CTHh and CKHh should be CThe and CKhe; that ir should be iin; that  m is an abbreviation for iin; and that b, u, g are just badly shaped versions of other glyphs.

And finally I suspect that p and f are fancy forms of te and ke, respectively.

Would your analysis be compatible with some or all of these hunches?

From the data above, I would say there's some (weak) support for 'm' being a variant of 'r', but I did not test 'm' vs. 'iin', nor the other cases you pose. But in general I can 'easily' apply any kind of substitution, for instance 'iin' = 'X' and then calculate the distance from 'm'. Just it's not fully automated, I need to copy two big tables in Excel and then get the final results manually from there, so it takes time and I cannot do it now (and surely I will not code anything for the foreseeable future). So I'm sorry but you'll need to be patient, and I'll check if 'ee' is close to 'ch/sh', 'r' to 'iin' and 'm' to 'iin'.

Or you can download my software tool from GitHub You are not allowed to view links. Register or Login to view. and do it yourself (ask for directions in case, but it's easier to do than to explain)
RE: Similarity of Voynichese glyphs according to their immediate statistical environment

Mauro > 1 hour ago

I tested if 'te' is similar to 'p' and 'ke' to 'f' ('te' and 'ke' were the only substitutions made), on Herbal A. They don't look similar, average distances are rather high (distance_following and distance_previous are high too). 'te' and 'ke', instead, strongly resemble... themselves, and are rather distant from any other character (in Herbal A, at least)
Next Oldest Next Newest

Similarity of Voynichese glyphs according to their immediate statistical environment

Index

Similarity of Voynichese glyphs according to their immediate statistical environment

RE: Similarity of Voynichese glyphs according to their immediate statistical environment

RE: Similarity of Voynichese glyphs according to their immediate statistical environment

RE: Similarity of Voynichese glyphs according to their immediate statistical environment