The Voynich Ninja

Most people will be familiar with the algorithm of Boris V. Sukhotin that identifies which characters in a text are more likely to be vowels and which are more likely to be consonants.
The Russian text was translated into English by Jacques Guy and published in Cryptologia.

There is also a copy of this article You are not allowed to view links. Register or Login to view. .

Sukhotin has designed several more articles and these were also translated by Jacques Guy. He posted them to the old mailing list in 1997.

I have converted these to HTML and You are not allowed to view links. Register or Login to view. .

They seem quite interesting, and I am not aware of anyone having tried them out on the Voynich MS text.

Sukhotin has authored several books (in Russian), some that I googled out:

- Exploring grammar with numerical methods
- Detecting morphemes in texts without spaces
- Optimization methods of language research

He worked in the Russian Language Institute of the USSR Academy of Sciences.

He was also interested in problematics of potential communication with extraterrestrial civilizations. In 1969, he co-authored a book entitled "Extraterrestrial сivilizations. Issues of interstellar communication", where he wrote a chapter dedicated to decipherment of the to-be messages from extraterrestrials.

Thanks for sharing these, Rene! The articles are very interesting and the techniques described seem very practical.

(22-03-2018, 08:33 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Most people will be familiar with the algorithm of Boris V. Sukhotin that identifies which characters in a text are more likely to be vowels and which are more likely to be consonants.
The Russian text was translated into English by Jacques Guy and published in Cryptologia.

There is also a copy of this article You are not allowed to view links. Register or Login to view. .

Sukhotin has designed several more articles and these were also translated by Jacques Guy. He posted them to the old mailing list in 1997.

I have converted these to HTML and You are not allowed to view links. Register or Login to view. .

They seem quite interesting, and I am not aware of anyone having tried them out on the Voynich MS text.

Thanks for bringing this to my attention. Sukhotin's algorithm is new to me, and should be easy to implement. I'm surprised it hasn't been done already.

A while ago I ran a Principal Component Analysis (PCA) on the glyphs in various texts including the Voynich Manuscript, with the aim of identifying the language, and also which letters in the language the glyphs correspond to. For most of the known languages I tried (Latin, English German, Italian, and Polish), there was a cluster containing the vowels. The Voynich Manuscript PCA was very different. The problem with PCA, though, is that it's sensitive to the input format. The original analysis (at You are not allowed to view links. Register or Login to view.) used vectors whose components were the frequencies of the following letters. As these follow an exponential distribution, it's dominated by common glyphs. So during the last week or so I repeated the PCA but this time I used log frequencies, and included Old Testament (OT) Hebrew, 16th Century Hungarian, Georgian, and Etruscan texts. The plots didn't change radically, but instead of a vowel cluster, vowels were now on one branch, and consonants (usually starting with l, n, r, and s, independently of language) on the other, with space at the root. PCA appears to detect that vowels are usually followed by consonants and vice-versa.

The Voynich Manuscript glyph PCA plot now looked a lot more like the other languages, except that its vowel branch is unclear and appears to be truncated. Among the known languages, the closest fit seems to be Italian (texts by Dante and Machiavelli). The f, k, p, t and cfh, ckh, cph, and cth EVA glyphs were similarly placed to Italian f, c, p, t and v, g, b, and d glyphs. I examined Italian after merging a, o, and u into one vowel, and e and i into another, but the results were inconclusive. I might be able to get further using heuristic search.

Next, I looked at the vowel branch truncation. Unpointed OT Hebrew lacks vowels, but in their place on the PCA plot were the five sofit (word-final) consonants. I also examined Etruscan (Liber Linteus), which supposedly has four vowels (but the Liber Linteus also has y), and Georgian which has long consonant clusters (e.g. in Mtkvari). Those both had separate vowel and consonant branches. As did Hungarian.

The closest match to the Voynich Manuscript, and the reason for my looking at Hungarian, was the Rohonc Codex, which is also undeciphered.

I'll put this analysis on my web site soon. The payoff for this for me is that I'm now able to write Unicode characters on X11 windows, something which is practically undocumented, and which I wasn't able to do before.

(23-03-2018, 05:33 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.
(22-03-2018, 08:33 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Most people will be familiar with the algorithm of Boris V. Sukhotin that identifies which characters in a text are more likely to be vowels and which are more likely to be consonants.
The Russian text was translated into English by Jacques Guy and published in Cryptologia.

There is also a copy of this article You are not allowed to view links. Register or Login to view. .

Sukhotin has designed several more articles and these were also translated by Jacques Guy. He posted them to the old mailing list in 1997.

I have converted these to HTML and You are not allowed to view links. Register or Login to view. .

They seem quite interesting, and I am not aware of anyone having tried them out on the Voynich MS text.

Thanks for bringing this to my attention. Sukhotin's algorithm is new to me, and should be easy to implement. I'm surprised it hasn't been done already.

...

It has been done already. I have linked to some of the Cryptologia articles in which the Sukhotin algorithm has been applied to the Voynich text here (along with some commentary):

You are not allowed to view links. Register or Login to view.

(23-03-2018, 06:03 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(23-03-2018, 05:33 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.
(22-03-2018, 08:33 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Most people will be familiar with the algorithm of Boris V. Sukhotin that identifies which characters in a text are more likely to be vowels and which are more likely to be consonants.
The Russian text was translated into English by Jacques Guy and published in Cryptologia.

There is also a copy of this article You are not allowed to view links. Register or Login to view. .

Sukhotin has designed several more articles and these were also translated by Jacques Guy. He posted them to the old mailing list in 1997.

I have converted these to HTML and You are not allowed to view links. Register or Login to view. .

They seem quite interesting, and I am not aware of anyone having tried them out on the Voynich MS text.

Thanks for bringing this to my attention. Sukhotin's algorithm is new to me, and should be easy to implement. I'm surprised it hasn't been done already.

...

It has been done already. I have linked to some of the Cryptologia articles in which the Sukhotin algorithm has been applied to the Voynich text here (along with some commentary):

You are not allowed to view links. Register or Login to view.

Yes, there's the additional problem of the identification of glyphs. In my analyses of the VMS I've been treating EVA in, iin, iiin, etc. as single glyphs as well as i on its own. For known languages, you still get separate vowel and consonant branches however you identify glyphs within reason, e.g. for German text I can treat each of ch, ng, qu, and sch as single glyphs, or separate them into letters, without it changing things much. If you have a particular EVA glyph set in mind I can test it on the usual Takahashi transcription.

There is probably a minor misunderstanding here.

The vowel recognition algorithm of Sukhotin is, as I wrote, rather well known in the Voynich community, and has been experimented with.

It is the set of other algorithms by Sukhotin for which I am not aware of any documented experiments.
I imagine, though, that Jacques Guy would have tried this.

I'm a pretty big fan of computational attacks. I think they are worth pursuing, and I've seen a small number that advance our understanding of the VMS text.

What I see lacking, however, when I look at the history of computational attacks on VMS text, is the acknowledgment that the spaces may not be spaces and the shapes that look like vowels may not be vowels. The possibility of null characters is also often overlooked. Many statistical tests assume one or more of these to be true and don't seem able to break away from this kind of thinking (along with a number of other assumptions).

Thank you, Rene!
I would be curious to see "Partitioning a text into its morphemes" in action on Voynichese and on known languages. I also wonder if it could be adapted to first identify "graphemes"....

I must say I find these articles much more difficult to read than the 1991 paper "Vowel Identification". For instance, this sentence with three negations gives me a hard time Smile

Quote:there is no one—letter string which does not have a non—empty intersection with another string

But I hope I will manage to understand enough of it to try and implement something.

The book I referenced above discusses four algorithms of detecting morphemes (didn't check which one is the algorithm discussed in the article translated by Guy, if any).

I have a reproduction of the book, but did not read it. At a glance, I noted that neither of the algorithms is perfect, and the book discusses some ways to improve them.

ReneZ

Anton

doranchak

DonaldFisk

-JKP-

DonaldFisk

ReneZ

-JKP-

MarcoP

Anton