(22-03-2018, 08:33 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Most people will be familiar with the algorithm of Boris V. Sukhotin that identifies which characters in a text are more likely to be vowels and which are more likely to be consonants.
The Russian text was translated into English by Jacques Guy and published in Cryptologia.
There is also a copy of this article You are not allowed to view links. Register or Login to view. .
Sukhotin has designed several more articles and these were also translated by Jacques Guy. He posted them to the old mailing list in 1997.
I have converted these to HTML and You are not allowed to view links. Register or Login to view. .
They seem quite interesting, and I am not aware of anyone having tried them out on the Voynich MS text.
Thanks for bringing this to my attention. Sukhotin's algorithm is new to me, and should be easy to implement. I'm surprised it hasn't been done already.
A while ago I ran a Principal Component Analysis (PCA) on the glyphs in various texts including the Voynich Manuscript, with the aim of identifying the language, and also which letters in the language the glyphs correspond to. For most of the known languages I tried (Latin, English German, Italian, and Polish), there was a cluster containing the vowels. The Voynich Manuscript PCA was very different. The problem with PCA, though, is that it's sensitive to the input format. The original analysis (at You are not allowed to view links.
Register or
Login to view.) used vectors whose components were the frequencies of the following letters. As these follow an exponential distribution, it's dominated by common glyphs. So during the last week or so I repeated the PCA but this time I used log frequencies, and included Old Testament (OT) Hebrew, 16th Century Hungarian, Georgian, and Etruscan texts. The plots didn't change radically, but instead of a vowel cluster, vowels were now on one branch, and consonants (usually starting with l, n, r, and s, independently of language) on the other, with space at the root. PCA appears to detect that vowels are usually followed by consonants and vice-versa.
The Voynich Manuscript glyph PCA plot now looked a lot more like the other languages, except that its vowel branch is unclear and appears to be truncated. Among the known languages, the closest fit seems to be Italian (texts by Dante and Machiavelli). The f, k, p, t and cfh, ckh, cph, and cth EVA glyphs were similarly placed to Italian f, c, p, t and v, g, b, and d glyphs. I examined Italian after merging a, o, and u into one vowel, and e and i into another, but the results were inconclusive. I might be able to get further using heuristic search.
Next, I looked at the vowel branch truncation. Unpointed OT Hebrew lacks vowels, but in their place on the PCA plot were the five sofit (word-final) consonants. I also examined Etruscan (Liber Linteus), which supposedly has four vowels (but the Liber Linteus also has y), and Georgian which has long consonant clusters (e.g. in Mtkvari). Those both had separate vowel and consonant branches. As did Hungarian.
The closest match to the Voynich Manuscript, and the reason for my looking at Hungarian, was the Rohonc Codex, which is also undeciphered.
I'll put this analysis on my web site soon. The payoff for this for me is that I'm now able to write Unicode characters on X11 windows, something which is practically undocumented, and which I wasn't able to do before.