The Voynich Ninja
Comparing the Voynich by Word Position Profiles - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Comparing the Voynich by Word Position Profiles (/thread-4933.html)



Comparing the Voynich by Word Position Profiles - quimqu - 18-09-2025

Hello everyone,

while working on the idea of my You are not allowed to view links. Register or Login to view. I tried a small exercise. Imagine that the Voynich is a positional substitution cipher, where each position in a word is encoded independently.

What I did was to take the Voynich tokens in EVA transliteration, look at them position by position, and record the distribution of characters. But I deliberately ignored which character it was. In other words: at position 1 we might have 40% of one character, 30% of another, 20% of a third, and so on. Just the shape of the distribution, not the labels.

Then I repeated the same procedure for several different languages. My working assumption is that if the corpora are large enough, the positional distributions should be similar within texts of the same language. Here is the result I got from the texts I currently have available:

CorpusDistanceTokens
Alchemical herbal (Latin)0.3276,536
De Docta Ignorantia (Latin)0.37437,121
Tirant lo Blanc (Catalan)0.395419,309
La Reine Margot (French)0.396112,803
Ambrosius Medionalensis (Latin)0.402117,734
El Lazarillo de Tormes (Spanish)0.40320,060
Simplicius Simplicissimus (German)0.415189,804
Romeo and Juliet (English)0.45124,822
The English Physician (Culpepper) (English)0.460135,362

So what does this mean? In this experiment the texts that came out closest to the Voynich were in Latin (especially the “Alchemical herbal” and “De Docta Ignorantia”), followed by Catalan, French, and Spanish. German and English were clearly further away.

Of course this does not prove the language of the Voynich, but it is interesting that the nearest matches are all Romance or Latin texts, and the Germanic ones sit lower down the ranking. It suggests that, at least under this positional-distribution approach, the Voynich behaves more like Romance/Latin than like Germanic languages.

Note: I used the "Alchemical herbal" transliteration from Marco Ponzi and the german Simplicius Simplicissimus version of Jorge Stolfi.


RE: Comparing the Voynich by Word Position Profiles - Jorge_Stolfi - 18-09-2025

(Today, 11:31 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In other words: at position 1 we might have 40% of one character, 30% of another, 20% of a third, and so on. Just the shape of the distribution, not the labels.

That is enough  information to compute the entropy of the first character.  For the entropy of the other characters one would need the distribution of the first n characters, not just the nth character,

I don't put much faith in this kind of analysis because the token and lexeme length distributions of Voynichese are quite different from those of "European" languages (Indo-European, Finno-Ugric, Semitic, Turkic, Basque).   

Thus the 8th letter of an "European" word would at best correspond to maybe half of the 4th letter of a Voynichese word.  But in fact we know that the character entropy of Voynichese words is spread out more evenly along the word than that of "European" words, so that the correspondence is not even linear.

And moreover these statistics are highly dependent on the spelling system, much more than on the language, and even on the nature of the text.  For instance, the Voynichese statistics would be quite different if the qo and y prefixes of Voynichese words were always split off, or always joined to the next word. Or if one spells Arabic "alkitab" as "al-kitab" or "al kitab" or "alktab"...

All the best, --jorge


RE: Comparing the Voynich by Word Position Profiles - quimqu - 18-09-2025

(11 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I don't put much faith in this kind of analysis...

Yes, I completely agree that comparing natural languages with a transliteration of Voynichese is risky, for all the reasons you mention: segmentation, transcription conventions, word length distribution, and orthography, amongst others. My goal is far away to claim a linguistic match, but simply to run the comparison and show the outcome. I found the result curious and thought it might be interesting to share, even if it can only be taken as an exploratory observation.


RE: Comparing the Voynich by Word Position Profiles - quimqu - 18-09-2025

(11 hours ago)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I don't put much faith in this kind of analysis...

Do you think we cannot squeeze the transliterations anymore?