I've updated my recent article at You are not allowed to view links.
Register or
Login to view.
Written languages have different properties, manifesting themselves as different distributions of glyphs on a PCA plot. Here, I've plotted Hungarian, Italian, a mystery language, the Voynich Manuscript, Mandarin Chinese encoded as Pinyin, and Latin. I've also presented an argument that the cryptographic techniques available at the time of the VMS's creation wouldn't change a PCA plot significantly (but a transposition cipher would). I've explained how the plots work.
None of the plots for any language I've tried so far closely matches the Voynichese plot, but perhaps you have a language in mind you think might be worth trying. If so, I need text (preferably in an phonetic alphabetic script) and a list of glyphs. The text doesn't have to be very long -- a few pages is enough. Alternatively, if you can think of an encoding method which might go from (for example) Italian into Voynichese, I could use encoded text as input and see whether that results in a match.
This should expedite the identification/rejection of candidate plaintext languages and encryption methods. Even if the VMS text is meaningless, I would still expect the scribes to have pronounced it, and EVA is certainly pronounceable (but doesn't resemble any known language).
Thank you, Donald! I think the PCA diagrams provide a useful tool to compare different scripts.
You write that the diagrams can be "rotated or reflected" and mean the same: I guess that a 90 degrees rotation would swap the first and second components, and I am not sure that would be desirable. For instance, in Italian it is the X axis (1st component, I guess) that separates vowels from consonants. But mirroring one or both axis so that the most frequent letter (excluding *) is in a fixed quadrant (e.g. bottom-right) could help the visual comparison.
In your analysis, your write that some of the plots "vaguely resemble the Voynichese plot". It would be great to have a quantitative similarity measure as an indicator of how much two plots are close or different.
I have now identified a language whose glyph PCA, and therefore presumably its phonetic properties, closely match those of Voynich Manuscript. It is is obscure but geographically and historically plausible. More details can be found at You are not allowed to view links.
Register or
Login to view..
(14-05-2018, 07:32 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Thank you, Donald! I think the PCA diagrams provide a useful tool to compare different scripts.
You write that the diagrams can be "rotated or reflected" and mean the same: I guess that a 90 degrees rotation would swap the first and second components, and I am not sure that would be desirable. For instance, in Italian it is the X axis (1st component, I guess) that separates vowels from consonants. But mirroring one or both axis so that the most frequent letter (excluding *) is in a fixed quadrant (e.g. bottom-right) could help the visual comparison.
In your analysis, your write that some of the plots "vaguely resemble the Voynichese plot". It would be great to have a quantitative similarity measure as an indicator of how much two plots are close or different.
Mirroring PCA plots is very easy for me to do and I'll ensure that comparable plots are aligned the same way in future.
Comparing PCA plots is more problematic. There are standard algorithms for matching points in different 3D scenes but these aren't suitable here. It's better to identify plausible languages by eye first (see my other post) and then match up the glyphs by trial and error, while ensuring the output texts share properties (ideally, words). The main concern here is to avoid outputting gibberish. This is something cryptographers will know more about than me, but I'm interested in adding useful algorithms to my text-processing library.
(29-05-2018, 03:08 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.I have now identified a language whose glyph PCA, and therefore presumably its phonetic properties, closely match those of Voynich Manuscript. It is is obscure but geographically and historically plausible. More details can be found at You are not allowed to view links. Register or Login to view..
Very interesting! I've also had the idea that it's a Northwest Caucasian language, based on the statistical investigations of Emma May Smith and others that suggested to me that there are only two "full" vowels in Voynichese, represented by EVA a (equivalent to y?) and o. Interestingly, during the time when the vellum was made, there was a Genoese trading post in Sukhumi (then called Sebastopolis), now the capital of Abkhazia. There were other trading posts elsewhere in the region. I found more about it You are not allowed to view links.
Register or
Login to view., although I don't know how accurate this is, and the author of the site certainly has an agenda, he is trying to prove that the region was originally inhabited by Georgians and the Abkhazians immigrated later (this is a political issue, as the status of Abkhazia is currently internationally disputed). I remember that Diane also mentioned the city in the context of transmission of content from East to West.
However, while Abkhaz and other Northwest Caucasian languages have only two phonemic vowels, these are pronounced differently depending on the surrounding consonants. A non-native speaker unfamiliar with modern-day phonological theory would probably have written the words as he heard them, so he would have used more than two vowel symbols. So, a phonemic orthography using only two vowels would probably have been invented by a native speaker.
By the way,if you're looking for a larger Abkhaz language corpus, there's also a You are not allowed to view links.
Register or
Login to view. in the language.
I haven't had a chance to study old Abkhaz, but in modern Abkhaz, the words are longer, on average, than VMS tokens, and the vowel-sounds (assuming "y" is similar to Greek) are often combined two or three in a row.
Donald, this is most interesting! Having not heard of Abkhaz, I looked at the Wikipedia page on the Abkhaz language You are not allowed to view links.
Register or
Login to view. . This has an example text, the romanisation of which is given as:
Quote:Darbanzaalak auaɥy dshoup ihy daqwithny. Auaa zegj zinlei patulei eiqaroup. Urth irymoup ahshyɥi alamysi, dara daragj aesjei aesjei reiphsh eizyqazaroup.
I perked up when I saw the repeated words "aesjei aesjei" in the second sentence - reminiscent of repeating words in the VMS

Moreover, the "dara daragj" word sequence with the same stem looks oddly familiar.
But this language is not old enough, is it?
(31-05-2018, 12:18 AM)julian Wrote: You are not allowed to view links. Register or Login to view.Donald, this is most interesting! Having not heard of Abkhaz, I looked at the Wikipedia page on the Abkhaz language You are not allowed to view links. Register or Login to view. . This has an example text, the romanisation of which is given as:
Quote:Darbanzaalak auaɥy dshoup ihy daqwithny. Auaa zegj zinlei patulei eiqaroup. Urth irymoup ahshyɥi alamysi, dara daragj aesjei aesjei reiphsh eizyqazaroup.
I perked up when I saw the repeated words "aesjei aesjei" in the second sentence - reminiscent of repeating words in the VMS
Moreover, the "dara daragj" word sequence with the same stem looks oddly familiar.
But this language is not old enough, is it?
This, and the text I analysed, are modern Abkhaz. The problem is that as far as we know, 15th century Abkhaz was never recorded. The earliest recorded Abkhaz was made in the early 17th century by a Turkish traveller called Mehmed Zilli (Evliya Çelebi). You can find it at You are not allowed to view links.
Register or
Login to view.. Some of the phrases he recorded are vulgar. Different languages change at different rates but the Abkhaz he recorded is, I think, recognizably similar to that of today, though you have to bear in mind that Turkish might have changed over the last 400 years (though that will be better documented). Other related languages are as deficient in vowels and rich in consonants as Abkhaz, and they might be worth checking too.
The Human rights declaration is available in Abkhaz, so that can be used as a sample.
In the past, educated Abkhaz folks used other languages to express themselves in writing - Greek, then Georgian, then also Turkish. First Abkhaz alphabet is dated to mid-19c and was developed by a Russian researcher based on Cyrillic script.
Interestingly, the still undeciphered Maykop Stone which is at least older than 2000 BC and is the most ancient native writing through all ex-USSR territories, was once associated with Abkhaz language by Turchaninov, but his theory was not supported by others.
(31-05-2018, 09:50 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view. (31-05-2018, 12:18 AM)julian Wrote: You are not allowed to view links. Register or Login to view.Donald, this is most interesting! Having not heard of Abkhaz, I looked at the Wikipedia page on the Abkhaz language You are not allowed to view links. Register or Login to view. . This has an example text, the romanisation of which is given as:
Quote:Darbanzaalak auaɥy dshoup ihy daqwithny. Auaa zegj zinlei patulei eiqaroup. Urth irymoup ahshyɥi alamysi, dara daragj aesjei aesjei reiphsh eizyqazaroup.
I perked up when I saw the repeated words "aesjei aesjei" in the second sentence - reminiscent of repeating words in the VMS
Moreover, the "dara daragj" word sequence with the same stem looks oddly familiar.
But this language is not old enough, is it?
This, and the text I analysed, are modern Abkhaz. The problem is that as far as we know, 15th century Abkhaz was never recorded. The earliest recorded Abkhaz was made in the early 17th century by a Turkish traveller called Mehmed Zilli (Evliya Çelebi). You can find it at You are not allowed to view links. Register or Login to view.. Some of the phrases he recorded are vulgar. Different languages change at different rates but the Abkhaz he recorded is, I think, recognizably similar to that of today, though you have to bear in mind that Turkish might have changed over the last 400 years (though that will be better documented). Other related languages are as deficient in vowels and rich in consonants as Abkhaz, and they might be worth checking too.
I think that if the glyph PCA is a close match, and the word similarity statistics are also a match, this is very promising. By word similarity, I mean for consecutive pairs of words, what is the average edit distance between them.
I need to dig up some analysis I did on edit distances in the VMS and apply it to Abkhaz.