The Voynich Ninja
Character entropy of Voynichese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Character entropy of Voynichese (/thread-148.html)

Pages: 1 2 3 4 5 6 7 8 9 10


RE: Character entropy of Voynichese - -JKP- - 03-06-2018

(03-06-2018, 05:51 PM)Anton Wrote: You are not allowed to view links. Register or Login to view....

(It's unlikely that a medieval scholar would use more than one character for a null). But the results will of course depend on transcription, so different transcriptions need be considered.


In the diplomatic ciphers collected by Tranchedino, it was quite common for there to be several nulls in each cipher-set BUT even though use of multiple nulls did exist in the middle ages, the VMS code does not have a verbose set of symbols as in the Tranchedino codes and thus would be less likely to have multiple nulls.


RE: Character entropy of Voynichese - Anton - 03-06-2018

Well, the quick screening showed that the best I could achieve from this Bennett's alphabet for You are not allowed to view links. Register or Login to view. is h1-h2 = 1.26, which is observed if we remove spaces and a or if we remove spaces and h. Still not good. Other approaches to the alphabet, such as EVA, should be checked, but probably at some other time...


RE: Character entropy of Voynichese - Helmut Winkler - 04-06-2018

(03-06-2018, 01:52 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.To comment upon Helmut's You are not allowed to view links. Register or Login to view., but not introduce offtopic in that thread.

 Just to show  what I mean: There is a good example in the scribal abbreviations article of the English Wikipedia of a highly abbreviated text. I don't pretend to know much about these statistics stuff, but I can't imagine there would not be a considerable difference
You are not allowed to view links. Register or Login to view.


RE: Character entropy of Voynichese - Anton - 04-06-2018

Helmut, the idea in a nutshell is that certain quantitative parameters can be calculated over a text of a reasonable length, which parameters would more or less systematically depend (with various degrees of dependence) upon the language, the time period (e.g. 18th century vs nowadays), the topics, the author (the author's personal style of expression). The so-called "character entropy" is one of such parameters (actually there are different flavours of character entropy, such as h1 or h2, but I don't touch that here). Basically they indicate how efficiently the script (the alphabet) is used to convey information. The problem with the Voynich manuscript is that information entropies calculated for the Voynichese text are notably lower than those calculated over plain texts in European languages (not only European, but we discuss the Latin script here). This strongly suggests that the Voynichese is not just a plain rendering of a plain text in a fancy script. Instead, some operations (unknown to us) must have been performed over the plain text before it was rendered in the Voynichese script. One of the operations that one could perform over a text is abbreviation. But, as I indicated above through a way of an example, abbreviating a Latin text does not bring character entropy closer to what is exhibited in the Voynich, instead, it brings it away from that. This makes sense, since with abbreviation you begin to convey the same information even in a more efficient manner (while Voynich does quite the opposite).

If you are further interested in the concepts, You are not allowed to view links. Register or Login to view. we have a tutorial on that, which assumes no prior knowledge, it's still unfinished (I ever can't find time!) but I'd say it's 90% ready or the like.


RE: Character entropy of Voynichese - Helmut Winkler - 04-06-2018

(04-06-2018, 02:52 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Helmut, the idea in a nutshell is that certain quantitative parameters can be calculated over a text of a reasonable length, which parameters would more or less systematically depend (with various degrees of dependence) upon the language, the time period (e.g. 18th century vs nowadays), the topics, the author (the author's personal style of expression). The so-called "character entropy" is one of such parameters (actually there are different flavours of character entropy, such as h1 or h2, but I don't touch that here). Basically they indicate how efficiently the script (the alphabet) is used to convey information. The problem with the Voynich manuscript is that information entropies calculated for the Voynichese text are notably lower than those calculated over plain texts in European languages (not only European, but we discuss the Latin script here). This strongly suggests that the Voynichese is not just a plain rendering of a plain text in a fancy script. Instead, some operations (unknown to us) must have been performed over the plain text before it was rendered in the Voynichese script. One of the operations that one could perform over a text is abbreviation. But, as I indicated above through a way of an example, abbreviating a Latin text does not bring character entropy closer to what is exhibited in the Voynich, instead, it brings it away from that. This makes sense, since with abbreviation you begin to convey the same information even in a more efficient manner (while Voynich does quite the opposite).

If you are further interested in the concepts, You are not allowed to view links. Register or Login to view. we have a tutorial on that, which assumes no prior knowledge, it's still unfinished (I ever can't find time!) but I'd say it's 90% ready or the like.

Thank you for the information, I appreciate it


RE: Character entropy of Voynichese - Anton - 13-04-2019

In You are not allowed to view links. Register or Login to view. Nikolai claims that "after analyzing the text with the already available data, it was found that in words that begin with vowels, these vowels are omitted.  Moreover, and within words vowels are used very rarely".

Nikolai does not explain how he arrived to this conclusion, neither does he supply any considerations in support.

But nonetheless I decided to check what we will have in terms of character entropy if we make a text vowelless. (I'm not sure if I have not done that before).

I took the same "GDPR in Latin" source text referenced above with the following parameters:

h0 = 4.39
h1 = 3.95
h2 = 3.16

h0-h1 = 0.44
h1-h2 = 0.79

Then I excluded all vowels, and obtained the following:

h0 = 4.00
h1 = 3.47
h2 = 2.93

h0-h1 = 0.53
h1-h2 = 0.54

So despite the decrease in h2, the relative value h1-h2 (which is unusually high for the VMS - which is the main problem!) only decreases with leaving the vowels off.

It all may be different if vowels are then "reconstructed" in some other (statistically inefficient) manner. But I do not know how.