The c
oncept of
entropy is often menti
oned in relati
on to the Voynich text,
and rightfully so. Voynichese differs from regular texts in a number of ways, but its
entropy values are
one of the clearest indicati
ons. The purpose of this post is twofold:
1. To explain entropy intuitively. I am
one of those people who dreads numbers
and had trouble with mathematics in high school. When people were talking about
entropy, I was never sure if I understood correctly what they me
ant. It was
only after
Ant
on's threads about the subject,
and asking a bunch of questi
ons, that I became able to play around with
entropy myself. Having crossed from
one side to the other, I hope I will be able to explain these c
oncepts in a way almost everybody c
an underst
and. Still, I am likely to make mistakes,
and will welcome correcti
ons
and improvements in the comments. Keep in mind that we are talking about
one very specific applicati
on of
entropy here (informati
on theory),
and different fields may use different definiti
ons.
2. To demonstrate why entropy is so important in Voynich studies. Statistics people will often shoot down simple substituti
on soluti
ons
on sight,
and they have good reas
ons to do so. A simple substituti
on soluti
on is
one where, simply put, each Voynich glyph corresp
onds to
one letter or sound in your tr
anslati
on. Low
entropy is
one of the reas
ons why they d
on't work. If you w
ant to underst
and the challenges of the VM text, underst
anding
entropy is crucial. Of course,
entropy is not the
only problem, but it dem
onstrates well just how different Voynichese is from regular texts. Most import
antly, underst
anding the low
entropy problem may also guide us when looking for a soluti
on.
If you w
ant to c
ontinue researching the VM's text without learning about
entropy, that is of course fine,
and then I will see you in
another thread. For me, however, learning to underst
and
entropy was a real eye-opener,
and I w
ant to share this with those who may find other expl
anati
ons too technical.
A Metaphor about Entropy and Drinking Tea
Imagine you are a statistici
an and I am your roommate,
and I like to have a cup of tea every morning at 8 AM. You w
ant to
analyze my behaviour statistically,
and start to try
and predict which type of tea I will select each morning.
Your first bit of informati
on is that there are
ten different types of tea in my cupboard. These are the opti
ons I c
an theoretically choose from. When Voynich researchers talk about
h0, this is what they me
an: the diversity of opti
ons (glyphs or words) without
any informati
on about their actual use. Maybe I am saving some teas for special occasi
ons,
and maybe there are others I drink all the time
and keep buying again. Well, h0 does not care about this, it
only cares about the number of opti
ons,
and it increases when there are more opti
ons.
In the graph above, we c
an see how h0 grows when we know about more opti
ons (1 to 5 in this case). However, knowing my h0 w
on't help you much in predicting which tea I will drink: it does not tell you
anything about my preferences or other habits that might influence my selecti
on.
This gives you a new idea: you will observe my selecti
on for 100 days,
and add a tally mark for each type of tea I select. This is what we call
h1: it will take into account
frequency, but no order or patterns. If I c
onsume all ten types of tea equally, you will have ten tally marks with each type,
and my h1
entropy will be high: my tea c
onsumpti
on is unpredictable (again, if the
only informati
on you have is the relative frequency of each type). If, however, I dr
ank the same kind of tea 100 times, my behavior is predictable
and there is low
entropy in my tea-drinking system.
In the above graph, all ten types of tea are selected at least
once throughout the 100 data points in each case. But at the left of the graph, I have a str
ong preference for
only
one type, so h1 is low: nine out of ten times I will select my favorite type, so I am easy to predict. As we move towards the right, my number of preferred types increases,
and so does the
entropy (h1) in the system. Note that in the sec
ond graph, h0 is equal for all data points, since each type of tea has been drunk at least
once, so we know there are ten opti
ons. However, h1 is variable since I varied how often each opti
on was selected.
Now let us go with the worst case scenario: I drink equal amounts of each of the ten types. Going by frequency al
one (h1), you now have no additi
onal informati
on to help you predict which type I will select. As far as you know, I might drink ten different types of tea in ten different days, or I might drink the same type ten days in a row. It's complete chaos.
Luckily, you have a final secret weap
on:
h2, or conditional entropy. This is Voynich researchers' favorite type of
entropy, because it is where the VM really sets itself apart from everything else. What if I have a preference for a certain type of tea today, depending
on the type I dr
ank yesterday? Maybe I'm a total weirdo who always drinks tea in the same order. You start noticing a pattern: if I've had mint yesterday, I will drink lem
on today,
and if I've had lem
on today, I will drink Earl Grey tomorrow.
And after Earl Grey, it's always either black tea or green tea. My h1 is still through the roof, because I drink all teas in equal amounts. But luckily for you, my c
onditi
onal
entropy (h2) is very low. You c
an use yesterday's tea selecti
on to predict what I will pick next.
In the graph above, the left shows a situati
on where all 100 entries follow the same order (1,2,3...). The system is very predictable
and there is almost zero
entropy. As l
ong as you know what I dr
ank yesterday, you c
an predict what I will drink today.
On the right, I used the same numbers, but shuffled them r
andomly. This me
ans that h0 (number of opti
ons)
and h1 (frequency) remain exactly the same, but h2 is taken near its maximal potential. In this case, yesterday's choice has no influence
on what I will pick today.
Character Entropy vs Word Entropy
Voynich researchers will often talk about either word
entropy or character
entropy. These are two separate things:
one will study how predictable words are, the other will look at individual glyphs. The numbers will be different as well. For example, h0 in word
entropy is much higher th
an h0 in character
entropy, since
any given text may have thous
ands of different word types, but
only a few dozen different characters/glyphs. Both word
entropy and character
entropy are str
ange in Voynichese. However, I will
only write about character
entropy because this is what I know the most about.
Entropy and Information
There is a correlati
on between a writing system's
entropy values
and how efficiently it c
onveys informati
on. We c
an grasp the basics of VM
entropy without underst
anding the details of this matter, but I menti
on it
anyway because you might see some
one write about it. Some examples:
* The word "Voynich" in binary is "01010110 01101111 01111001 01101110 01101001 01100011 01101000". Alphabetic text has a much higher h0 th
an binary code. In everyday writing, using the alphabet is more efficient: I c
an get the same informati
on across with fewer characters.
* In English, "q" is usually followed by "u" (bar a few excepti
ons in lo
an words like q
anat, Iraq....). We c
an say that the "u" in words like "quest" does not add
any informati
on, because it is expected with near-complete certainty. I c
annot ch
ange my message by toggling this "u". Because "u after q" is predictable, its presence lowers the h2 of English, which lowers how efficiently written English tr
ansmits informati
on.
This talk about informati
on density feels a bit too abstract
and theoretical to me: after all, most historical writing systems are not designed for optimal efficiency, but have evolved over time.
Medieval Manuscripts, the VM and h0
Calculating the true h0 of medieval m
anuscripts is more difficult th
an it sounds. What to do with ligatures, abbreviati
on symbols, positi
onal variati
on, capitals... What about rare symbols that are used
only a few times? We might use a tr
anscripti
on of the text, but this is a cle
aned-up, abstract versi
on that does not exist
on parchment. Maybe we should assume Voynichese behaves like a cle
aned-up versi
on, since it is a novel "code" that might disregard things like capitalizati
on, abbreviati
on and other scribal c
onventi
ons?
Apart from that, how m
any different characters does a secti
on like Herbal A use? It depends how we count. In EVA, there are 19 characters used in Herbal A. But we c
an easily increase this number: counting benched gallows as separate glyphs will add four. We might also guess that "in"
and "iin" are separate glyphs,
and so
on.
With some tweaking, it is perfectly possible to get
an acceptable h0 value for Voynichese. But I think h0 is also the value that suffers the most from the way we tr
anscribe our text: each m
anuscript, including the VM, c
an be described with various degrees of st
andardizati
on and differentiati
on between glyph forms, which makes comparing h0 difficult. Moreover, I simply find h0 unreliable overall. If a scribe slaps a novel symbol at the end of a 200-page m
anuscript, the h0 value of the whole m
anuscript ch
anges because of this.
For reference, in my corpus of medieval texts, h0 reaches all the way from 4.25 (EVA tr
ansliterati
on of Q13B) to 6.95 (a Greek historical text).
Medieval Manuscripts, the VM and h1
We generally use h1 to get a more accurate "
entropy fingerprint" of a text. Trying something to m
anipulate h2 may ch
ange h1
and vice versa, so usually both are tracked at
once. The lowest h1 I have in my corpus is a Germ
an text, followed by EVA VM secti
ons
and other Germ
an texts. However, matching h1 without also matching h2 is not worth much (I would be happy with
any correcti
ons about this statement if it is inaccurate!)
Why Voynichese has a Character Entropy Problem: h2
The most obvious reas
on why Voynichese has a huge character
entropy problem is c
onditi
onal
entropy, h2. Remember the thing with "qu" in English? Well, Voynichese is kind of like that all the time.
Let's start in EVA. I give you a glyph, you tell me what's next (spaces also count). If you are a bit familiar with VM tr
ansliterati
ons, you c
an do this off the top of your head:
- q
- a
overwhelmingly i, then l,r, then m
- i
- n
- y
space in the vast majority of cases
- d
"y" in about half of the cases
- c
"h" in the vast majority of cases
Other glyphs show some more opti
ons, but they still tend to be quite restricted.
This is not normal!
As far as the numbers go, they are easy to remember. VM secti
ons are a bit below or above h2=2. Quire 13 gets really low with a value of 1.8. "Normal" medieval texts,
on the other h
and, have h2 values above 2.8, usually above 3. Again, this difference is huge,
and it blows all other problems out of the water.
Is EVA a problem?
When I wrote my "You are not allowed to view links.
Register or
Login to view." posts,
one thing I w
ondered was: to what extent does EVA influence these statistics? If the glyph "bench" is always written as "ch", maybe this is enough to mess things up. It turns
one glyph into a predictable pair. So what I did was to fix benches
and clusters involving "in", doing my utmost best to squeeze as much as possible h2 out of it. This is the result, in the graph below. You c
an see that the "fixed" Voynichese versi
ons outperform EVA, but they are still waaaay below
any normal text.
So is EVA a problem? Well, yes
and no. Yes because maybe there are choices in EVA that should be corrected for before performing certain
analyses, because EVA does probably lower h2. But also no, because the biggest problem is certainly not EVA, the biggest problem is Voynichese itself. The reas
on for this is simple: if I fix the "ch" situati
on by representing the bench with a single glyph, this new glyph is now
also predictable, because
all glyphs in the VM are too predictable.
People often ask me if I included this or that dialect in my corpus,
and my
answer is always the same: it does not matter! Differences between entire l
anguage families are much smaller th
an the difference between Voynichese
and normal text. Give me
any text in
any l
anguage,
and I c
an almost guar
antee you that its h2 will be above 2.8, which absolutely crushes Voynichese.
What's with those Verbose Ciphers?
In the same post, I tried to push my approach further, which took me unwittingly into verbose cipher territory. A verbose cipher is basically a cipher that obfuscates by adding unnecessary stuff. In a very simple example, I could verbosely obfuscate the word "Voynich" by adding a v after evert letter: "Vvovyvnvivcvhv". If Voynichese is the result of a verbose cipher, I could try to reverse this by rewriting comm
on glyph clusters (bigrams, trigrams) as single glyphs. For example, I could replace "dy" by "&"
and run the
entropy test again to see what ch
anged. After lots of trial
and error, I got almost-but-not-quite-normal
entropy values this way. Apparently Rene did better with some method, which I am really looking forward to learning more about. As Rene also noticed, however, the "rewriting n-grams" method has a signific
ant drawback: it makes words really short. As you c
an see in the "Voynich- Vvovyvnvivcvhv" example above, verbose encoding has the effect of lengthening words,
and Voynichese words aren't excessively l
ong to begin with.
What's the takeaway?
If you w
ant to have a ch
ance of solving Voynichese, you must take into account the
entropy problem - there is no way around it. Knowing about the
entropy issue is also interesting when assessing proposed soluti
ons. They will either:
* Focus
on single words. This locks in glyph corresp
ondences,
and makes it impossible to exp
and the system to a paragraph of text. Because Voynichese is abnormally low
entropy, it will not freely c
onvert to
any reas
onable text in
any writing system that has been c
onsidered so far.
*Add their own step to introduce
entropy. This is the infamous "interpretative step". The tr
anslator realizes that Voynichese does not provide enough opti
ons, so they find ways to increase the
entropy. For example, they will say each VM glyph c
an st
and for various plaintext glyphs. This leads to massive problems, which I will not go into right now (basically, the
one-way cipher).
But most import
antly, knowing about the nature of Voynichese's
entropy problems will hopefully help us to work towards the right type of soluti
on.
One that ch
anges the
entropy density of the text without actually inventing informati
on. But that may still be a way's off.
Comments, questi
ons
and additi
ons are welcome.