![]() |
Word Entropy - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Word Entropy (/thread-2928.html) |
RE: Word Entropy - Anton - 15-09-2019 Koen, did you try to apply these calculations to Torsten's auto-generated texts? I have not followed the respective discussion in detail, but I had the impression that some kind of software generator was made available. RE: Word Entropy - Koen G - 15-09-2019 Do you mean entropy over increasing text size? I've only got the text of some 10800 words someone shared. h0 h1 h2 words types 11.12153352 9.196643491 3.850987991 10832 2228 RE: Word Entropy - Anton - 15-09-2019 No I mean just entropy calculation over a large piece of text. RE: Word Entropy - ReneZ - 15-09-2019 (15-09-2019, 10:54 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Do you mean entropy over increasing text size? I've only got the text of some 10800 words someone shared. For this text, the theoretical maximum h2 is 4.206, meaning that the actual h2 is at 91.55% of that. RE: Word Entropy - Anton - 15-09-2019 Quite like Voynich TT above. RE: Word Entropy - ReneZ - 15-09-2019 Similar to the plot in You are not allowed to view links. Register or Login to view. , it is possible to make the h2 vs. h1 plot for the several texts that were analysed for different lengths. These were: Pliny (in blue) Text "M" (in orange) [I suspect this could Mattioli] Text "B" (in grey) [I suspect this could be the German text 'Barlaam' ?] A green dot for Voynich TT and a black one for Timm's text has been added. The influence of text length is clearly dominating. RE: Word Entropy - Koen G - 15-09-2019 I see what you mean, Rene, those are huge differences. No wonder I couldn't make much sense of the h2 graph. What about h1-h2 though? RE: Word Entropy - Koen G - 15-09-2019 Marco tweaked nablator's code so now I can limit it to the first n words; I used 5000 for this graph, comparing h1 and h2. Looks better, right? Full data for this set is in the sheet WordEntropyN You are not allowed to view links. Register or Login to view. It looks like Voynichese h1 or h2 is a little bit out of proportion? Edit: also, is it expected that they are inversely proportional? RE: Word Entropy - Koen G - 15-09-2019 Next I ran it on 20,000 words. Marco's code automatically selected the files that were large enough, which was very convenient. I then isolated those same files and ran them on 2000 words. The effect appears to be that there is more spread as word count increases. For example, three out of four Voynichese files overlap completely in the 2000 words. Still, even on 20,000 words, languages cluster well. The top-right drift of Voynichese becomes more apparent in 20,000 words. RE: Word Entropy - RenegadeHealer - 15-09-2019 Koen, I read your blog posts about TTR and MATTR, and I agree that these are potentially very useful tools in telling signal from noise in the VMS, and narrowing our language search to ones with a similar statistical profile. |