The Voynich Ninja
[Article] Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript (/thread-2957.html)



Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Torsten - 02-10-2019

New conference paper about the VMS: "You are not allowed to view links. Register or Login to view."

The paper by Natsuki Kouyama and Mario Köppen is available at You are not allowed to view links. Register or Login to view.. 

The authors conclude: 
Quote:We have shown that VMS is not encrypted. We can also see that VMS mostly resembles natural language, however, comes closer to a programming language. Therefore, it appears as a unique piece of text, nevertheless.



RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - RobGea - 03-10-2019

Their use of FSG transcription is nice.
Not so sure about why they reject the Ciphertext hypothesis.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - ReneZ - 03-10-2019

I haven't read it completely yet (not all pages are visible), but from the initial description it appears that they are using the modern concept of a cipher, i.e. they assume that a cipher has the property of equalising all character probabilities. This is what a modern cipher would do, and this is what the Voynich MS does not have.

It is one of my pet peeves that the concept 'language vs. cipher' is problematic.

If the MS text is a cipher, in particular a simple cipher, then it is also language.

There is no fundamental difference between writing a text with an invented set of symbols, or a language with an unknown alphabet.
What matters is if the language that has been represented using these symbols is known or unknown, or in fact not a language at all.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - ReneZ - 03-10-2019

The paper is deriving its conclusions entirely from the frequency distribution of single characters.

The first comparison seems to be a visual comparison of the shape of this distribution.
The other two are the regular single-character entropy as it has been used here already (called the Shannon entropy), and the more general Renyi entropy, also applied to single characters.

I had not heard of this before, but what the paper says, combined with a quick look at Wikipedia, clarify what this is about. This is an entropy with a variable parameter alpha, which may run from 0 to infinity.
If alpha = 0 one gets what we have called the H0. If alpha = 1 one gets the Shannon entropy (H1). If alpha goes to infinity, one gets the negative logarithm of the probability of the most frequent character.

Therefore, if two character distributions are similar, also the Renyi entropy as a function of alpha will be similar.

This is what happens in the comparison with plain texts, and this is what the paper shows.

What remains mysterious is that the number of different characters in the various plain texts are extremely large, e.g. 140 for an Italian text. Even counting upper and lower case, the various accented characters, and interpunction, I can't understand where that comes from.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - nickpelling - 03-10-2019

Maybe their texts also encoded all the body language that goes with Italian. *shrugs*


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Anton - 03-10-2019

When I see dixits like "our methods are valid and efficient", my first thought is that authors definitely have problems with methodology. Ironically, their own example of "ciphertext" of Section 2 would immediately fail the test with the "valid and efficient method".

Also, I don't like that the authors seem to hint that they are pioneers of character-stats-analysis of the VMS.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Anton - 04-10-2019

(03-10-2019, 08:18 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.What remains mysterious is that the number of different characters in the various plain texts are extremely large, e.g. 140 for an Italian text. Even counting upper and lower case, the various accented characters, and interpunction, I can't understand where that comes from.

Russian beats 'em all Tongue  141! Let's see: 33 letters, capitalization doubles that to 66, then ten digits and, let's say, fifteen punctuation marks, this gives 91... still far from 141.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Anton - 04-10-2019

Ah, probably we should add math signs. There is room for expansion, and with vectors, gradients and circulations we possibly can meet the target.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - ReneZ - 04-10-2019

Indeed, the authors do not seem to have read much previous work.


RE: Entropy Analysis of Questionable Text Sources by Example of the Voynich Manuscript - Davidsch - 04-10-2019

Quote:The paper is deriving its conclusions entirely from the frequency distribution of single characters.


Aha, therefore skipping it, next!