![]() |
Word Entropy - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Word Entropy (/thread-2928.html) |
RE: Word Entropy - Monica Yokubinas - 15-09-2019 I am curious as to why you are not using any Semitic text references to the the script? I will tell you that the Astrology pages show Semitic references and not Latin ones: for example Ares in Latin was known as the Ram or Lamb in many Semitic languages. As with the Latin Capricorn is know as a mountain goat or Ibix in Semitic. RE: Word Entropy - Koen G - 15-09-2019 (15-09-2019, 11:09 PM)Monica Yokubinas Wrote: You are not allowed to view links. Register or Login to view.I am curious as to why you are not using any Semitic text references to the the script?Collecting all these authentic historical texts, checking them, removing editorial notes and introductions... and general pre-processing is a lot of work. I want to do this as correctly as possible. All of this is much more challenging (not to say, impossible) in languages I don't know and scripts I can't read. To get a good idea of a language I need 20 texts in this language, preferably of different types and genres. They need to be in medieval language and spelling. They have to be copy-pastable, so no google books. They need to be at least 5000 words long. If you have a Semitic text like this I will happily include it. I will move there eventually but the task is daunting (given my lack of knowledge on the subject). RE: Word Entropy - Monica Yokubinas - 16-09-2019 (15-09-2019, 11:19 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I'll see what i can find in the next few days for a script you can use... in the mean time how Celtic and Semitic languages are similar You are not allowed to view links. Register or Login to view.(15-09-2019, 11:09 PM)Monica Yokubinas Wrote: You are not allowed to view links. Register or Login to view.I am curious as to why you are not using any Semitic text references to the the script?Collecting all these authentic historical texts, checking them, removing editorial notes and introductions... and general pre-processing is a lot of work. I want to do this as correctly as possible. All of this is much more challenging (not to say, impossible) in languages I don't know and scripts I can't read. RE: Word Entropy - ReneZ - 16-09-2019 You could add the line for the hypothetical maximum h2 which, for 5000 tokens is: h2 = 12.3 - h1 (or h1 = 12.3 - h2) I would have put h1 on the horizontal axis and h2 on the vertical, but never mind that. In any case, one can then see where h2 deviates more from this hypothetical maximum. RE: Word Entropy - Koen G - 16-09-2019 Axes reversed and line added (I drew it on manually by connecting the right intersections). Voynichese is closer to its theoretical max, like Latin. RE: Word Entropy - ReneZ - 16-09-2019 The actual information that word-h2 brings is this difference between the theoretical maximum and the actual value. This tells us how the word pair distribution differs from {1,1,1,1, ... , 1,0,0,0, ... ,0 } RE: Word Entropy - Koen G - 16-09-2019 Right. If you were to shuffle randomly, would the dot be on the line or still below it? RE: Word Entropy - ReneZ - 16-09-2019 This shuffling is a powerful tool, because it leaves the text (seen as 'word inventory') intact, yet it removes the meaning. We saw from the example of nablator and the TT transcription (Takeshi, not Torsten) that it does not go to the hypothetical maximum. This is why I started using the term hypothetical. One could see the 'theoretical' maximum as the result for a case where all words have been shuffled. Because some repeated word pairs would arise arbitrarily. In principle it should be possible to actually predict this, but I don't feel up to that now. There's a hint of a suspicion that the green and grey dots are a little bit above the 'real text' dots. RE: Word Entropy - Koen G - 16-09-2019 Indeed, Rene, I think they are. I calculated this by dividing h2/(12.3-h1). Then averaged per language. SP 0.8056087529 Eng 0.8183898864 Ger 0.8593272856 IT 0.8721744293 VM Q13 0.8997510321 Lat 0.9031773814 VM TT 0.9385874706 RE: Word Entropy - MarcoP - 16-09-2019 (16-09-2019, 09:01 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Indeed, Rene, I think they are. I calculated this by dividing h2/(12.3-h1). Then averaged per language. You could also try plotting this... |