The Voynich Ninja

Full Version: Word Entropy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
I am curious as to why you are not using any Semitic text references to the the script? I will tell you that the Astrology pages show Semitic references and not Latin ones: for example Ares in Latin was known as the Ram or Lamb in many Semitic languages. As with the Latin Capricorn is know as a mountain goat or Ibix in Semitic.
(15-09-2019, 11:09 PM)Monica Yokubinas Wrote: You are not allowed to view links. Register or Login to view.I am curious as to why you are not using any Semitic text references to the the script?
Collecting all these authentic historical texts, checking them, removing editorial notes and introductions... and general pre-processing is a lot of work. I want to do this as correctly as possible.  All of this is much more challenging (not to say, impossible) in languages I don't know and scripts I can't read.

To get a good idea of a language I need 20 texts in this language, preferably of different types and genres. They need to be in medieval language and spelling. They have to be copy-pastable, so no google books. They need to be at least 5000 words long. 

If you have a Semitic text like this I will happily include it. I will move there eventually but the task is daunting (given my lack of knowledge on the subject).
(15-09-2019, 11:19 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(15-09-2019, 11:09 PM)Monica Yokubinas Wrote: You are not allowed to view links. Register or Login to view.I am curious as to why you are not using any Semitic text references to the the script?
Collecting all these authentic historical texts, checking them, removing editorial notes and introductions... and general pre-processing is a lot of work. I want to do this as correctly as possible.  All of this is much more challenging (not to say, impossible) in languages I don't know and scripts I can't read.

To get a good idea of a language I need 20 texts in this language, preferably of different types and genres. They need to be in medieval language and spelling. They have to be copy-pastable, so no google books. They need to be at least 5000 words long. 

If you have a Semitic text like this I will happily include it. I will move there eventually but the task is daunting (given my lack of knowledge on the subject).
I'll see what i can find in the next few days for a script you can use... in the mean time how Celtic and Semitic languages are similar You are not allowed to view links. Register or Login to view.
You could add the line for the hypothetical maximum h2 which, for 5000 tokens is:

h2 = 12.3 - h1

(or  h1 = 12.3 - h2)

I would have put h1 on the horizontal axis and h2 on the vertical, but never mind that.

In any case, one can then see where h2 deviates more from this hypothetical maximum.
Axes reversed and line added (I drew it on manually by connecting the right intersections). 
Voynichese is closer to its theoretical max, like Latin.

[attachment=3321]
The actual information that word-h2 brings is this difference between the theoretical maximum and the actual value. This tells us how the word pair distribution differs from {1,1,1,1, ... , 1,0,0,0, ... ,0 }
Right. If you were to shuffle randomly, would the dot be on the line or still below it?
This shuffling is a powerful tool, because it leaves the text (seen as 'word inventory') intact, yet it removes the meaning.

We saw from the example of nablator and the TT transcription (Takeshi, not Torsten) that it does not go to the hypothetical maximum. This is why I started using the term hypothetical.
One could see the 'theoretical' maximum as the result for a case where all words have been shuffled. Because some repeated word pairs would arise arbitrarily.

In principle it should be possible to actually predict this, but I don't feel up to that now.

There's a hint of a suspicion that the green and grey dots are a little bit above the 'real text' dots.
Indeed, Rene, I think they are. I calculated this by dividing h2/(12.3-h1). Then averaged per language.

SP 0.8056087529
Eng 0.8183898864
Ger 0.8593272856
IT 0.8721744293
VM Q13 0.8997510321
Lat 0.9031773814
VM TT 0.9385874706
(16-09-2019, 09:01 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Indeed, Rene, I think they are. I calculated this by dividing h2/(12.3-h1). Then averaged per language.

SP 0.8056087529
Eng 0.8183898864
Ger 0.8593272856
IT 0.8721744293
VM Q13 0.8997510321
Lat 0.9031773814
VM TT 0.9385874706

You could also try plotting this...
Pages: 1 2 3 4 5 6 7 8 9