The concept of entropy is often mentioned in relation to the Voynich text, and rightfully so. Voynichese differs from regular texts in a number of ways, but its entropy values are one of the clearest indications. The purpose of this post is twofold:
1. To explain entropy intuitively. I am one of those people who dreads numbers and had trouble with mathematics in high school. When people were talking about entropy, I was never sure if I understood correctly what they meant. It was only after Anton's threads about the subject, and asking a bunch of questions, that I became able to play around with entropy myself. Having crossed from one side to the other, I hope I will be able to explain these concepts in a way almost everybody can understand. Still, I am likely to make mistakes, and will welcome corrections and improvements in the comments. Keep in mind that we are talking about one very specific application of entropy here (information theory), and different fields may use different definitions.
2. To demonstrate why entropy is so important in Voynich studies. Statistics people will often shoot down simple substitution solutions on sight, and they have good reasons to do so. A simple substitution solution is one where, simply put, each Voynich glyph corresponds to one letter or sound in your translation. Low entropy is one of the reasons why they don't work. If you want to understand the challenges of the VM text, understanding entropy is crucial. Of course, entropy is not the
only problem, but it demonstrates well just how different Voynichese is from regular texts. Most importantly, understanding the low entropy problem may also guide us when looking for a solution.
If you want to continue researching the VM's text without learning about entropy, that is of course fine, and then I will see you in another thread. For me, however, learning to understand entropy was a real eye-opener, and I want to share this with those who may find other explanations too technical.
A Metaphor about Entropy and Drinking Tea
Imagine you are a statistician and I am your roommate, and I like to have a cup of tea every morning at 8 AM. You want to analyze my behaviour statistically, and start to try and predict which type of tea I will select each morning.
Your first bit of information is that there are
ten different types of tea in my cupboard. These are the options I can theoretically choose from. When Voynich researchers talk about
h0, this is what they mean: the diversity of options (glyphs or words) without any information about their actual use. Maybe I am saving some teas for special occasions, and maybe there are others I drink all the time and keep buying again. Well, h0 does not care about this, it only cares about the number of options, and it increases when there are more options.
In the graph above, we can see how h0 grows when we know about more options (1 to 5 in this case). However, knowing my h0 won't help you much in predicting which tea I will drink: it does not tell you anything about my preferences or other habits that might influence my selection.
This gives you a new idea: you will observe my selection for 100 days, and add a tally mark for each type of tea I select. This is what we call
h1: it will take into account
frequency, but no order or patterns. If I consume all ten types of tea equally, you will have ten tally marks with each type, and my h1 entropy will be high: my tea consumption is unpredictable (again, if the only information you have is the relative frequency of each type). If, however, I drank the same kind of tea 100 times, my behavior is predictable and there is low entropy in my tea-drinking system.
In the above graph, all ten types of tea are selected at least once throughout the 100 data points in each case. But at the left of the graph, I have a strong preference for only one type, so h1 is low: nine out of ten times I will select my favorite type, so I am easy to predict. As we move towards the right, my number of preferred types increases, and so does the entropy (h1) in the system. Note that in the second graph, h0 is equal for all data points, since each type of tea has been drunk at least once, so we know there are ten options. However, h1 is variable since I varied how often each option was selected.
Now let us go with the worst case scenario: I drink equal amounts of each of the ten types. Going by frequency alone (h1), you now have no additional information to help you predict which type I will select. As far as you know, I might drink ten different types of tea in ten different days, or I might drink the same type ten days in a row. It's complete chaos.
Luckily, you have a final secret weapon:
h2, or conditional entropy. This is Voynich researchers' favorite type of entropy, because it is where the VM really sets itself apart from everything else. What if I have a preference for a certain type of tea today, depending on the type I drank yesterday? Maybe I'm a total weirdo who always drinks tea in the same order. You start noticing a pattern: if I've had mint yesterday, I will drink lemon today, and if I've had lemon today, I will drink Earl Grey tomorrow. And after Earl Grey, it's always either black tea or green tea. My h1 is still through the roof, because I drink all teas in equal amounts. But luckily for you, my conditional entropy (h2) is very low. You can use yesterday's tea selection to predict what I will pick next.
In the graph above, the left shows a situation where all 100 entries follow the same order (1,2,3...). The system is very predictable and there is almost zero entropy. As long as you know what I drank yesterday, you can predict what I will drink today. On the right, I used the same numbers, but shuffled them randomly. This means that h0 (number of options) and h1 (frequency) remain exactly the same, but h2 is taken near its maximal potential. In this case, yesterday's choice has no influence on what I will pick today.
Character Entropy vs Word Entropy
Voynich researchers will often talk about either word entropy or character entropy. These are two separate things: one will study how predictable words are, the other will look at individual glyphs. The numbers will be different as well. For example, h0 in word entropy is much higher than h0 in character entropy, since any given text may have thousands of different word types, but only a few dozen different characters/glyphs. Both word entropy and character entropy are strange in Voynichese. However, I will only write about character entropy because this is what I know the most about.
Entropy and Information
There is a correlation between a writing system's entropy values and how efficiently it conveys information. We can grasp the basics of VM entropy without understanding the details of this matter, but I mention it anyway because you might see someone write about it. Some examples:
* The word "Voynich" in binary is "01010110 01101111 01111001 01101110 01101001 01100011 01101000". Alphabetic text has a much higher h0 than binary code. In everyday writing, using the alphabet is more efficient: I can get the same information across with fewer characters.
* In English, "q" is usually followed by "u" (bar a few exceptions in loan words like qanat, Iraq....). We can say that the "u" in words like "quest" does not add any information, because it is expected with near-complete certainty. I cannot change my message by toggling this "u". Because "u after q" is predictable, its presence lowers the h2 of English, which lowers how efficiently written English transmits information.
This talk about information density feels a bit too abstract and theoretical to me: after all, most historical writing systems are not designed for optimal efficiency, but have evolved over time.
Medieval Manuscripts, the VM and h0
Calculating the true h0 of medieval manuscripts is more difficult than it sounds. What to do with ligatures, abbreviation symbols, positional variation, capitals... What about rare symbols that are used only a few times? We might use a transcription of the text, but this is a cleaned-up, abstract version that does not exist on parchment. Maybe we should assume Voynichese behaves like a cleaned-up version, since it is a novel "code" that might disregard things like capitalization, abbreviation and other scribal conventions?
Apart from that, how many different characters does a section like Herbal A use? It depends how we count. In EVA, there are 19 characters used in Herbal A. But we can easily increase this number: counting benched gallows as separate glyphs will add four. We might also guess that "in" and "iin" are separate glyphs, and so on.
With some tweaking, it is perfectly possible to get an acceptable h0 value for Voynichese. But I think h0 is also the value that suffers the most from the way we transcribe our text: each manuscript, including the VM, can be described with various degrees of standardization and differentiation between glyph forms, which makes comparing h0 difficult. Moreover, I simply find h0 unreliable overall. If a scribe slaps a novel symbol at the end of a 200-page manuscript, the h0 value of the whole manuscript changes because of this.
For reference, in my corpus of medieval texts, h0 reaches all the way from 4.25 (EVA transliteration of Q13B) to 6.95 (a Greek historical text).
Medieval Manuscripts, the VM and h1
We generally use h1 to get a more accurate "entropy fingerprint" of a text. Trying something to manipulate h2 may change h1 and vice versa, so usually both are tracked at once. The lowest h1 I have in my corpus is a German text, followed by EVA VM sections and other German texts. However, matching h1 without also matching h2 is not worth much (I would be happy with any corrections about this statement if it is inaccurate!)
Why Voynichese has a Character Entropy Problem: h2
The most obvious reason why Voynichese has a huge character entropy problem is conditional entropy, h2. Remember the thing with "qu" in English? Well, Voynichese is kind of like that all the time.
Let's start in EVA. I give you a glyph, you tell me what's next (spaces also count). If you are a bit familiar with VM transliterations, you can do this off the top of your head:
- q
- a
overwhelmingly i, then l,r, then m
- i
- n
- y
space in the vast majority of cases
- d
"y" in about half of the cases
- c
"h" in the vast majority of cases
Other glyphs show some more options, but they still tend to be quite restricted.
This is not normal!
As far as the numbers go, they are easy to remember. VM sections are a bit below or above h2=2. Quire 13 gets really low with a value of 1.8. "Normal" medieval texts, on the other hand, have h2 values above 2.8, usually above 3. Again, this difference is huge, and it blows all other problems out of the water.
Is EVA a problem?
When I wrote my "You are not allowed to view links.
Register or
Login to view." posts, one thing I wondered was: to what extent does EVA influence these statistics? If the glyph "bench" is always written as "ch", maybe this is enough to mess things up. It turns one glyph into a predictable pair. So what I did was to fix benches and clusters involving "in", doing my utmost best to squeeze as much as possible h2 out of it. This is the result, in the graph below. You can see that the "fixed" Voynichese versions outperform EVA, but they are still waaaay below any normal text.
So is EVA a problem? Well, yes and no. Yes because maybe there are choices in EVA that should be corrected for before performing certain analyses, because EVA does probably lower h2. But also no, because the biggest problem is certainly not EVA, the biggest problem is Voynichese itself. The reason for this is simple: if I fix the "ch" situation by representing the bench with a single glyph, this new glyph is now
also predictable, because
all glyphs in the VM are too predictable.
People often ask me if I included this or that dialect in my corpus, and my answer is always the same: it does not matter! Differences between entire language families are much smaller than the difference between Voynichese and normal text. Give me any text in any language, and I can almost guarantee you that its h2 will be above 2.8, which absolutely crushes Voynichese.
What's with those Verbose Ciphers?
In the same post, I tried to push my approach further, which took me unwittingly into verbose cipher territory. A verbose cipher is basically a cipher that obfuscates by adding unnecessary stuff. In a very simple example, I could verbosely obfuscate the word "Voynich" by adding a v after evert letter: "Vvovyvnvivcvhv". If Voynichese is the result of a verbose cipher, I could try to reverse this by rewriting common glyph clusters (bigrams, trigrams) as single glyphs. For example, I could replace "dy" by "&" and run the entropy test again to see what changed. After lots of trial and error, I got almost-but-not-quite-normal entropy values this way. Apparently Rene did better with some method, which I am really looking forward to learning more about. As Rene also noticed, however, the "rewriting n-grams" method has a significant drawback: it makes words really short. As you can see in the "Voynich- Vvovyvnvivcvhv" example above, verbose encoding has the effect of lengthening words, and Voynichese words aren't excessively long to begin with.
What's the takeaway?
If you want to have a chance of solving Voynichese, you must take into account the entropy problem - there is no way around it. Knowing about the entropy issue is also interesting when assessing proposed solutions. They will either:
* Focus on single words. This locks in glyph correspondences, and makes it impossible to expand the system to a paragraph of text. Because Voynichese is abnormally low entropy, it will not freely convert to any reasonable text in any writing system that has been considered so far.
*Add their own step to introduce entropy. This is the infamous "interpretative step". The translator realizes that Voynichese does not provide enough options, so they find ways to increase the entropy. For example, they will say each VM glyph can stand for various plaintext glyphs. This leads to massive problems, which I will not go into right now (basically, the one-way cipher).
But most importantly, knowing about the nature of Voynichese's entropy problems will hopefully help us to work towards the right type of solution. One that changes the entropy density of the text without actually inventing information. But that may still be a way's off.
Comments, questions and additions are welcome.