The Voynich Ninja
Character entropy of Voynichese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Character entropy of Voynichese (/thread-148.html)

Pages: 1 2 3 4 5 6 7 8 9 10


RE: Character entropy of Voynichese - Anton - 16-03-2016

Actually, what I meant here is character entropy as a general parameter, not related to any specific language, but to natural language in general. There might have been extinct languages, and they may have utilized less letters in the alphabet. Or this might have been an invented script for a language unknown to the scribe. I simply leave the validity of these assumptions out of scope.

The whole entropy discourse in respect to the VMS, to the extent which I am acquainted with, is a bit fuzzy and, I am afraid, this confuses many researchers who are not acquainted with the information theory.

To the consideration that I explained above, I would like to add (it seems that this fact is not well understood) that direct comparison of the character entropy between languages with different numbers of letters in the alphabet makes little sense, because entropy will depend upon the number of the letters in the alphabet. In other words, character entropy of English and character entropy of Hawaiian are not directly compatible, for to make any conclusions out of there. I guess what should be compared rather, is the degree (e.g. expressed in percentage) in which the character entropy reaches the maximum possible entropy. The maximum possible entropy is observed when all characters are equally probable to appear - hence, it depends on the size of the alphabet exclusively.

I have no access to Bennet's book (maybe anyone could provide a scan of the respective chapter if the copyright so allows) and don't know what comparisons were being made. As to the You are not allowed to view links. Register or Login to view., I would say that it is a bit wandering and, besides, I haven't time to examine it in detail, but he (albeit speaking of the 2nd order entropy), makes a strange statement (emphasis is mine):

Quote:However, the H2(max) depends tremendously on m, the size of the character set chosen. For Voynich text, Currier has 36 characters and Basic Frogguy has 23 characters. Characters that are hardly ever used have little effect on h1 and h2, but could make a tremendous difference in H2(max). Therefore, this measure was not used.

From this explanation, I fail to understand wherefore this measure was not used, and also it seems that it was not used for first order entropy neither.


RE: Character entropy of Voynichese - Davidsch - 18-03-2016

After that comment i decided to re-read again what entropy actually means (You are not allowed to view links. Register or Login to view.
and i came to the conclusion that this is not an exact science but more relational question: what do you measure in relation to what?

The practical added value of such research focused on entropy is only usefull if you are planning to write a paper.
In solving any mystery in the VMS i think this has no practical use, does it?


RE: Character entropy of Voynichese - Anton - 18-03-2016

It is exact science, and it is a useful measure, but as any measure it has to be used in proper way during comparisons. I think I will write a forum post explaining the notions of various entropies having been used in Voynich studies when I have time (probably next week).


RE: Character entropy of Voynichese - ReneZ - 18-03-2016

It has at least one very practical use: in case a text is encrypted using a simple substitution cipher, the entropy values (all of them) do not change in the process.
As a side note, entropy values are also not changed by writing backwards.

Given that (some of) the entropy values in the Voynich MS text are anomalously low, either the source text has an anomalously low entropy, or the process to convert it to 'Voynichese' reduced it significantly.
This puts considerable constraints on all proposed solutions.

It is correct that the values are in a way relative, and should be interpreted as such.
The single character entropy is somewhat low, but the charactar pair entropy is the most significant anomaly, especially in comparison to this single character entropy.
This is not simply a problem of Eva. Eva was introduced two decades after Bennett pointed out the low entropy.


RE: Character entropy of Voynichese - Davidsch - 18-03-2016

okidoki, i have developed a similar method, problem is that it was developed for visual comparisons.
If i want to compare changes in the text, for example write backwards, i have to make a visual chart and compare it.
Although i can make such and see it in some minutes, it is still not convenient when you want to look at many possible text-modifications.

In that scenario where you want to automate possible changes (for example comparing Bacon cipher possibilities, see my other thread) 
and see if any of those changes result in anything worthwhile, this method could be of invaluable use.

But didn't anybody perform such an task before?
It seems to me that the NSA would try such immediately on the VMS-text.


RE: Character entropy of Voynichese - Sam G - 19-03-2016

Basically the low entropy rules out the possibility that the VMS is a cipher, because very few kinds of ciphers can lower entropy, and these can all be excluded for other reasons.

Here's an interesting bit from the list archives.  Jacques Guy says:

You are not allowed to view links. Register or Login to view.

Quote:There aren't many weapons in the arsenal. If the VMs is a cipher, then
it is a cipher which _lowers_ the entropy of the text. The only cipher
that can do that (Jim Gillogly, please correct me) is a cipher that
encodes single characters of the plain texts as sequences of characters,
or whole words and sentences. E.g. cat -> cloakarmtower (c > cloak, etc.)
Even so, the cipher has to be a bijection: c becomes cloak only and not other
words. There is another possibility: lots of nulls. It is much like
cat -> cloakarmtower, except that you are allowed any number of words
to encipher each letter _provided that_ they are built on strict, narrow
patterns. I'll take French "javanais" to illustrate this. The rule is:
insert "av" before the first vowel of every syllable. Thus: "bonjour"
-> "bavonjavour". Modify the rule to: insert "av", or "ov", or "ugl"
and you get a cipher text with a lower entropy than the plain text.
In that case, the VMs is very short, and the labels must be ignored.

Jim Gillogly's response:

Quote:Besides these cases, some ambiguous ciphers such as the Keyphrase
can lower the entropy.  As an English example, you can have:


plaintext:  abcdefghijklmnopqrstuvwxyz
ciphertext: THEPRESIDENTSPEAKSNONSENSE


In this case "my hovercraft is full of eels"
becomes      "SS IESRSESTEO DN ENTT EE RRTN"


Note the excess of S's.  The Keyphrase will frequently produce
runs of three or four of the same letter, and occasionally
adjacent ciphertext words which represent different plaintext
words.  However, the Keyphrase and other entropy-reducing
ciphers I know (except those that use lots of nulls, as Jacques
points out) also reduce the size of the alphabet.


Unless one is unlucky, the recipient will be able to get most
of the words right given the mapping, as will the cryptanalyst
given enough material.

In the case of the VMS, the keyphrase thing would've been cracked already as it is basically a form of simple substitution cipher (and probably wouldn't lower the entropy enough anyway), and verbose ciphering (or the addition of verbose nulls) can be ruled out because the words in the VMS simply aren't long enough, and there aren't enough repeated 2+ word phrases to consider single plaintext words enciphered into multiple ciphertext words.  The presence of single-word labels also poses a major problem for any scenario that involves treating the VMS words as anything other than words, or at least stand alone "units" capable of bearing meaning (as opposed to merely parts of verbosely enciphered words).

Really, there is no way of escaping the conclusion that the low entropy is simply an intrinsic part of the content, not the result of some "cipher mechanism".  The most likely explanation for that is that the VMS is written in a language which has low entropy, i.e. that has a relatively rigid phonotactic structure.  From that point of view, there is nothing "anomalously low" about the entropy at all, since there exist languages with lower entropy, i.e. with a more rigid phonotactic structure than we find in the VMS text.


RE: Character entropy of Voynichese - Anton - 19-03-2016

Even leaving aside the fact that the "anomalously low entropy" of Voynichese is something at least not consistently proven, from the viewpoint of  formal logic the entropy's value, be it high or low, says nothing about whether the text is enciphered or not.

Just because a message in a language which has low entropy may have been enciphered.


RE: Character entropy of Voynichese - Torsten - 19-03-2016

(19-03-2016, 05:44 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.Really, there is no way of escaping the conclusion that the low entropy is simply an intrinsic part of the content, not the result of some "cipher mechanism".  The most likely explanation for that is that the VMS is written in a language which has low entropy, i.e. that has a relatively rigid phonotactic structure.  From that point of view, there is nothing "anomalously low" about the entropy at all, since there exist languages with lower entropy, i.e. with a more rigid phonotactic structure than we find in the VMS text.

The weak word order can only mean that the words are not ordered by a grammar as we know for natural languages. Therefore you would have to assume a language without grammar.


RE: Character entropy of Voynichese - Sam G - 19-03-2016

(19-03-2016, 06:19 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Even leaving aside the fact that the "anomalously low entropy" of Voynichese is something at least not consistently proven, from the viewpoint of  formal logic the entropy's value, be it high or low, says nothing about whether the text is enciphered or not.

Just because a message in a language which has low entropy may have been enciphered.

Well, it can't be a ciphertext of any language which has a higher entropy than the VMS text (which rules out all European languages and many others), and it places very strong constraints on ciphers that could have been used on low entropy languages.

(19-03-2016, 06:37 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(19-03-2016, 05:44 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.Really, there is no way of escaping the conclusion that the low entropy is simply an intrinsic part of the content, not the result of some "cipher mechanism".  The most likely explanation for that is that the VMS is written in a language which has low entropy, i.e. that has a relatively rigid phonotactic structure.  From that point of view, there is nothing "anomalously low" about the entropy at all, since there exist languages with lower entropy, i.e. with a more rigid phonotactic structure than we find in the VMS text.

The weak word order can only mean that the words are not ordered by a grammar as we know for natural languages. Therefore you would have to assume a language without grammar.

I've responded to this point of yours before:

You are not allowed to view links. Register or Login to view.

Quote:I suspect that there are rules we are not aware of governing word order in the VMS, so that (for example) in some cases we might see a pair of words A B, but in a different context we will see same the pair B A.  German, for instance, has some rules like this that we don't have in English, but it would be incorrect to say that German has "weak word order".  So the "weak word order" in the VMS might also be only apparent.  The problem is that we don't know what the rules are.

I think the "writing style" of the VMS also probably plays a role here.  Many people tacitly assume that a prose style similar to that which we use to write English must be employed in the VMS, but this isn't the only possibility.  The writing style is probably "weird", too.  This affects a number of considerations about word order, repetitiveness, lack of repeated long phrases, etc.



RE: Character entropy of Voynichese - Torsten - 19-03-2016

Quote:I've responded to this point of yours before:

You are not allowed to view links. Register or Login to view.

I've responded to your point that there is some word order in the VMS (see You are not allowed to view links. Register or Login to view.):


Quote:"There are only 35 word sequences which use at least three words and appear at least three times. Only for five of these sequences is the word order unchanged for the whole manuscript, whereas for 30 out of 35 phrases the word order does change." (see You are not allowed to view links. Register or Login to view. as Timm 2014: p. 3)

Moreover "An additional observation is that in 24 out of 35 cases these repeated sequences use at least two words which are either spelled the same or very similarly." (see Timm 2014: p. 3)

Sometimes similar words are used together. But even for them the order of words doesn't matter.