The Voynich Ninja
Character entropy of Voynichese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Character entropy of Voynichese (/thread-148.html)

Pages: 1 2 3 4 5 6 7 8 9 10


Character entropy of Voynichese - Anton - 23-01-2016

Comparatively low character entropy of Voynichese has traditionally been used as an argument against the natural language hypothesis. To make it clear, it is not the single argument against the natural language, but it seems to me that it is not the strongest one.

Why? Because I wonder how we can be sure of our calculation of Voynichese character entropy if we don't know the real Voynichese alphabet? We only work with transcriptions (such as EVA), which maybe (and I'm sure they are) not that adequate.

Take an English text and substitute all instances of "d" with "cl" (which visually is much the same, but not linguistically). In other words, exclude the letter "d" from the English alphabet and imagine that letters "c" and "l" now do all the work. I think that character entropy of English will then change (namely, decrease), will it not?

So if we decrease the level of decomposition in our transcriptions of Voynichese, entropy is likely to rise.


RE: Character entropy of Voynichese - Wladimir D - 25-01-2016

Relatively two languages “A” and “B” can not exclude following the development of events.
Author “A” in the process died without having completed the manuscript. Contours of figures have already been drawn. It remains only to write the text. Author (s) "B" knew very little language (did not know the language at all). "B" began using commonly used words from the language of "A" generate (invent) a word. The whole language “B“- a fake.
 
I can not understand how to apply the the words of the herbal section name in the stars. In my opinion, what is called the stars, the stars are not. It looks like a child's drawing of a flower with petals. The same applies to the last section, when the flower with the colored (red) the middle divides recipes into groups. (Anton wrote about it)


RE: Character entropy of Voynichese - Emma May Smith - 25-01-2016

Anton, you make a very good point. Entropy studies show that the Voynich text is not too far from natural language texts, and your suggestion that character combinations will improve it is true. I seem to recall that at least one study did several entropy measurements with different character combinations, but not all do. I shudder to think that some might even have parsed [ch] and [sh] and two characters. Is there a list of entropy measurements and the character set they use?


RE: Character entropy of Voynichese - ReneZ - 25-01-2016

The entropy is anomalously low, independent of the transcription alphabet. Indeed, there are different results depending on which alphabet is used, but these differences are significantly smaller than the anomaly observed.
Bennett in the 1970's (who first noted the anomalous values) used something similar to Currier's alphabet. Certainly, Eva should not be used for this type of analysis.


Indeed, the way to arrive at something similar to a 'normal' language is by compressing the Voynich text, i.e. combining characters.


However, this is just part of the picture.


RE: Character entropy of Voynichese - Anton - 25-01-2016

(25-01-2016, 08:26 AM)Wladimir D Wrote: You are not allowed to view links. Register or Login to view.Relatively two languages “A” and “B” can not exclude following the development of events.
Author “A” in the process died without having completed the manuscript. Contours of figures have already been drawn. It remains only to write the text. Author (s) "B" knew very little language (did not know the language at all). "B" began using commonly used words from the language of "A" generate (invent) a word. The whole language “B“- a fake.
 
I can not understand how to apply the the words of the herbal section name in the stars. In my opinion, what is called the stars, the stars are not. It looks like a child's drawing of a flower with petals. The same applies to the last section, when the flower with the colored (red) the middle divides recipes into groups. (Anton wrote about it)


Hi Wladimir, nice to see you on the forum at last Cool . To keep the discussion organized, please don't post "offtopic" messages. (E.g., this thread is for discussion of character entropy, and not for other things). If you don't find an existing appropriate thread, please don't hesitate to open a new one (in the subforum view, use the button "post thread").

(25-01-2016, 09:28 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The entropy is anomalously low, independent of the transcription alphabet. Indeed, there are different results depending on which alphabet is used, but these differences are significantly smaller than the anomaly observed.

Yes, but the point is that we don't know the real alphabet. We simply impose our own alphabets to Voynichese, but they may be all inadequate. What if iin is not three characters but a single character, and so on.

Character entropy is alphabet-dependent (as well is word entropy is dictionary-dependent), so my point is that, in the absence of knowledge of the alphabet and the dictionary, far-reaching conclusions should not be made from the entropy perspective alone

The other thing is that (and here I reply to Emma) if we begin to introduce more character groupings (in order for character entropy to rise), then Voynichese would look even more (and not less!) strange in respect to the natural language hypothesis - the average word length will become shorter and all that. And there are other points against natural language (I guess you made a blog post about that recently Wink )


RE: Character entropy of Voynichese - Emma May Smith - 25-01-2016

(25-01-2016, 11:53 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.The other thing is that (and here I reply to Emma) if we begin to introduce more character groupings (in order for character entropy to rise), then Voynichese would look even more (and not less!) strange in respect to the natural language hypothesis - the average word length will become shorter and all that. And there are other points against natural language (I guess you made a blog post about that recently Wink )

I don't know that "average word length" is a usable measurement here. As you yourself say, we don't know what we're measuring.


RE: Character entropy of Voynichese - Anton - 25-01-2016

When we decrease the level of detail of the characters, the word length measured in characters becomes shorter. So with alphabet uniting iin in one character instead of three, daiin from 5 characters will shrink to three, and so on.


RE: Character entropy of Voynichese - Emma May Smith - 25-01-2016

Sure, I get that. But how do we measure that against other languages?

We can say that Voynich words get shorter but we can't say that they are shorter on average than words in other languages. The problem is that we may not be counting the same thing.

Think of it like this: Japanese written in hirigana combines a whole syllable into a single character, whereas Japanese written in romaji has one sound a character (roughly, for the sake of convenience we will say it is true). Romaji words are longer than the same words in hirigana by number of characters, but the same by number of sounds. Simply counting characters is not a linguistic fact so long as we do not know what those characters stand for.

One more example: English and Greenlandic use the same five vowel letters from the Roman script (a, e, i, o, u), yet while English has several times that number of vowels, Greenlandic actually has fewer (three vowels and a diphthong). They use the same script in different ways to display the vowels (or not display in the case of English), and so simply counting vowel characters would not even then be a linguistic fact.

What I ultimately wish to say is this: we must have a model for not only what we think the discrete characters are within the script, but also what they stand for, before we can make a comparison. That model may well be wrong (we can try lots of models) but at least it would make a comparison valid. We might believe that [daiin] is five sounds, three sounds, or even only two sounds, but each guess could be measured as to how linguistically natural it is. If all possible models are unlinguistic then it may well weigh against the natural language hypothesis, but if some are more linguistic than others then it may weigh in favour of that model over others.


RE: Character entropy of Voynichese - ReneZ - 26-01-2016

Some comparisons of entropy statistics can be found here:
You are not allowed to view links. Register or Login to view.
(written by Dennis Stallings). I have a bit of an additional explanation of Bennett's work, with a reference to Dennis' paper here:
You are not allowed to view links. Register or Login to view.

The example given above, with the question whether 'iin' should count as 1 letter or as 3 is covered by this.
In Bennett's alphabet this is one single character.  In Currier's alphabet, all possible combinations of strings of i's followed by something are a single character, and this is part of the reason why he has 36 characters, i.e. more than most alphabets. Statistics using Currier's alphabet were also computed by Dennis. One cannot compress much more than that. (Note that Currier also represents all composite gallows like ckh as a single character).

The entropy analysis does not exclude that the text is meaningful or language-like. It does exclude that the text was obtained by a simple substitution of one of the common languages written in an alphabetic script.

"Common language" is a bit vague of course.

If there is a plain text, in Latin, Greek, English, or one of the Romance or Germanic languages, then "something else" has been done to it before it became the text we see. I'm not aware of detailed entropy analyses of Hebrew, Arabic, Persian, etc.  but these are not expected to have a particularly low second order entropy either. Specifically languages where vowels are not always written should be expected to have a somewhat higher second order entropy (there are more consonants than vowels to make pairs with).

The example of Japanese or other Asian languages is a different story. Here, not much can be said yet. There have been some experiments with different renditions of Chinese, but I don't believe that these are conclusive.


RE: Character entropy of Voynichese - Davidsch - 16-03-2016

This kind of basic discussion is important and i see many similar discussions going on at the same time on this site,
but actually it boils down to the same question:  how can we match or compare Voynese to a language?




Quote:What if iin is not three characters but a single character, and so on


If you replace iin or dain or chedy or any such high frequency word, or a combination of letters, you will end up with lesser unique letters than the usual 17.
 

Which already is too little for European languages. (there are only two languages in the rest of the world, but the do NOT match the VMS profile.
(Read more @ You are not allowed to view links. Register or Login to view.)