The Voynich Ninja
Have we ruled out simple substitution unwisely? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Have we ruled out simple substitution unwisely? (/thread-5344.html)

Pages: 1 2 3 4 5


RE: Have we ruled out simple substitution unwisely? - asteckley - 09-02-2026

(09-02-2026, 03:51 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.
(09-02-2026, 03:43 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.
(09-02-2026, 03:17 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Did our choice of transliteration alphabet influence the calculated entropy?



Much of the stuff you are suggesting would tend to increase predictability of glyph sequencing. One would expect the net result to be a lowering of the entropy even further.



I don't understand how suggesting that "aiiin" may be a variety of different combinations of letters increases predictability. It surely makes it less predictable.

Perhaps I'm not understanding your proposed application of different combinations.
In any case, you might test a few of these ideas individually—changing only one aspect of the transliteration system at a time—and then compare the entropy delta before and after each change. (That being said, I recognize that your point is not really about any of those specific suggestions, but about the consequences of overlooking how certain assumptions have become taken for granted.)


RE: Have we ruled out simple substitution unwisely? - eggyk - 09-02-2026

(09-02-2026, 05:26 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view. (That being said, I recognize that your point is not really about any of those specific suggestions, but about the consequences of overlooking how certain assumptions have become taken for granted.)

Indeed, the point of the thread is to ask whether there is a potential flaw shared by all of the popular transcription alphabets that causes unusual entropy results, instead of unusual entropy being an inherent trait of the VMS text. 

Obviously much has been tried in many different ways, the potential examples I mentioned were not fully fleshed out or tested (i would make a seperate thread for that).


RE: Have we ruled out simple substitution unwisely? - Koen G - 09-02-2026

(09-02-2026, 07:16 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Indeed, the point of the thread is to ask whether there is a potential flaw shared by all of the popular transcription alphabets that causes unusual entropy results, instead of unusual entropy being an inherent trait of the VMS text.

I would recommend anyone interested in this to fist read my blog post here: You are not allowed to view links. Register or Login to view. .

I also wondered what the impact of EVA was on entropy, the impact of the transliteration file used etc. So I drove it to the extreme by gradually replacing common EVA pairs with new letters. Some findings:
  • The impact of the initial transliteration file (TT vs ZL) is low. Don't stress about this.
  • Benched gallows are a big problem for anyone undertaking something like this. You need to make a choice about how to treat them, and that has an impact. No single choice is the obvious correct one.
  • Q13 is a huge outlier (in 2020, the extent of the difference was news to many people)
  • EVA lowers entropy in obvious ways, like by splitting the bench and chopping up "in"-clusters. However, fixing these does not make Voynichese normal by any definition of the word. 
  • In Herbal A, I obtained the best results by "desplitting" the following EVA clusters, in that order: qok, chol, chor, che, chy, ol, cho, or, qot, ar, eey, al, qo. This frankly ridiculous exercise leaves you with barely viable h2 stats and extremely short words.
  • The same can be done for other sections, with some different choices. 

What I learned from this is that EVA does make things look worse than they are, but they are still very bad. The fact that a cluster like [edy] wreaks havoc on your conditional character entropy has nothing to do with the transliteration system, and has everything to do with the Voynichese system.

Now, this was over 5 years ago, maybe someone has a better approach. To me it also looks like the proposals in the opening post wouldn't necessarily lead to an increase of entropy (it's not always intuitive). Such ideas remain speculation until you actually test them.


RE: Have we ruled out simple substitution unwisely? - Jorge_Stolfi - 09-02-2026

(09-02-2026, 07:46 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.What I learned from this is that EVA does make things look worse than they are, but they are still very bad.

Koen, first, changing the "spelling system" can indeed raise the entropy per character to close the theoretical limit of log_2(n) where n is the number of distinct letters you are allowed to use in the new spelling.  If the new spelling is (say) capital letters plus the digits 0-9, then you could get close to 5 bits/character.  This is one of the main results of information theory.  

There are algorithms for finding such efficient "spelling systems" given a large enough sample text.  That is basically how text compressors work.  

But the new spelling system produced by those algorithms can be ugly.  Then, second, you should try the "elements" that I proposed some time ago: You are not allowed to view links. Register or Login to view.  Those were guessed from looking at the Voynichese words so I claim that they are "natural".  They still don't take into account the possible role of a/o/y as additional modifiers (like the e modifier but always after it) .

Howver, third, you should not waste time with character (or digraph) entropy, precisely because it is largely determined by the spelling system, not by the language.  You should look instead at word entropy, because that is not affected by the spelling system -- as long as it is consistent (the same word is always spelled the same way) and does require splitting or joining words compared to some reference spelling.

But even then, word statistics are a property of the text, not of the language.  It the text is very repetitive (like those Alchemist Herbals) then the word entropy will be low too.

All the best, --stolfi


RE: Have we ruled out simple substitution unwisely? - eggyk - 09-02-2026

(09-02-2026, 07:46 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(09-02-2026, 07:16 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Indeed, the point of the thread is to ask whether there is a potential flaw shared by all of the popular transcription alphabets that causes unusual entropy results, instead of unusual entropy being an inherent trait of the VMS text.

I would recommend anyone interested in this to fist read my blog post here: You are not allowed to view links. Register or Login to view. .

I also wondered what the impact of EVA was on entropy, the impact of the transliteration file used etc. So I drove it to the extreme by gradually replacing common EVA pairs with new letters. Some findings:
  • The impact of the initial transliteration file (TT vs ZL) is low. Don't stress about this.
  • Benched gallows are a big problem for anyone undertaking something like this. You need to make a choice about how to treat them, and that has an impact. No single choice is the obvious correct one.
  • Q13 is a huge outlier (in 2020, the extent of the difference was news to many people)
  • EVA lowers entropy in obvious ways, like by splitting the bench and chopping up "in"-clusters. However, fixing these does not make Voynichese normal by any definition of the word. 
  • In Herbal A, I obtained the best results by "desplitting" the following EVA clusters, in that order: qok, chol, chor, che, chy, ol, cho, or, qot, ar, eey, al, qo. This frankly ridiculous exercise leaves you with barely viable h2 stats and extremely short words.
  • The same can be done for other sections, with some different choices. 

What I learned from this is that EVA does make things look worse than they are, but they are still very bad. The fact that a cluster like [edy] wreaks havoc on your conditional character entropy has nothing to do with the transliteration system, and has everything to do with the Voynichese system.

Now, this was over 5 years ago, maybe someone has a better approach. To me it also looks like the proposals in the opening post wouldn't necessarily lead to an increase of entropy (it's not always intuitive). Such ideas remain speculation until you actually test them.

I appreciate the response and the work done in that post. I remember reading it when I first joined the forum but it's worth me going through it again. 

The fact that EVA does lower the entropy and making changes has a measurable effect on the entropy is something I may persue. As I mentioned in the OP, there may be aspects of the EVA (or other transcription alphabets) that significantly decrease entropy that haven't been addressed. Or there could be tweaks to EVA that can be made that may significantly increase the entropy. I think being as exhaustive as humanly possible is worthwhile here. 

Out of curiosity, is there a widely accepted tool to calculate and plot the entropy values of text? I remember seeing a website link somewhere here.


RE: Have we ruled out simple substitution unwisely? - nablator - 09-02-2026

(09-02-2026, 09:59 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Out of curiosity, is there a widely accepted tool to calculate and plot the entropy values of text? I remember seeing a website link somewhere here.

The one and only. Big Grin

You are not allowed to view links. Register or Login to view.


RE: Have we ruled out simple substitution unwisely? - oshfdk - 09-02-2026

(09-02-2026, 04:58 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.I was suggesting this to see its affect on entropy, not as a decoding attempt. 

I can't say much about the entropy, I know how it works on paper, but when I tried computing it almost each and every single time I would overlook something and get some bogus results. So, I probably cannot add anything to the entropy discussion. 

Simple substitution won't work for many reasons and for me the entropy is not the most important of them.


RE: Have we ruled out simple substitution unwisely? - ReneZ - 10-02-2026

Just to remind people that the anomalously low entropy was discovered in 1976, and Eva was first introduced in 1999. 

For the impact of the transliteration language on entropy, see Figure 12 (with surrounding text) on You are not allowed to view links. Register or Login to view..


RE: Have we ruled out simple substitution unwisely? - Koen G - 10-02-2026

Stolfi: I get that we can simply focus on word entropy, and this is a useful exercise. But if you ignore character entropy, you're basically treating it like a code. Again, that's a viable approach, but not what most people seem to have in mind.

For those who think character entropy issues are not inherent to Voynichese, I can only encourage you to experiment with this yourself. That's the best way to get a feel for the problem.


RE: Have we ruled out simple substitution unwisely? - RadioFM - 10-02-2026

Here are some higher order [character] conditional entropies, standard EVA + character groupings Stolfi mentioned in this thread. Whitespace included. Herbal VMS typically has a lower [character] conditional entropy than natural languages, at least 1st order conditional entropy. At higher orders (3rd and above) the VMS seems to catch up or even raises a little bit above, though my numbers could be off so take it w/ a grain of salt.