Basic stats summary - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Voynich Talk (https://www.voynich.ninja/forum-6.html) +--- Thread: Basic stats summary (/thread-2800.html) |
Basic stats summary - RobGea - 30-05-2019 Hi all, it would be most helpful if there was a basic stats summary somewhere, just for basic word count and such from individual transcriptions. Something like this: Transcription: TT_ivtff_v0a IVTFF Eva- 1.5 # Extracted from LSI_ivtff_0d.txt # Version v0a of 26/08/2017 Notes: words with question marks rejected. '<->' replaced with space Stats: Total word count: 37759 distinct words: 8026 Hapax legomenon: 5527 words occurring twice or more: 2499 (anywhere in the manuscript) wordsoccurring2ormoretimes: of length: 11 How many: 0 wordsoccurring2ormoretimes: of length: 10 How many: 6 wordsoccurring2ormoretimes: of length: 9 How many: 57 wordsoccurring2ormoretimes: of length: 8 How many: 204 wordsoccurring2ormoretimes: of length: 7 How many: 456 wordsoccurring2ormoretimes: of length: 6 How many: 638 wordsoccurring2ormoretimes: of length: 5 How many: 599 wordsoccurring2ormoretimes: of length: 4 How many: 328 wordsoccurring2ormoretimes: of length: 3 How many: 145 wordsoccurring2ormoretimes: of length: 2 How many: 48 wordsoccurring2ormoretimes: of length: 1 How many: 18 EOF This kind of thing would be a great boon to baseline/calibrate any code. Yes it's all been done before, but finding the actual data is a pain. An easy to find reference would be good, esp. if several folks could agree on the numbers. If anyone knows where to find such a thing , that would be great. RE: Basic stats summary - -JKP- - 30-05-2019 "... words occurring twice or more:" It needs to be made clear whether this is words occurring twice or more (anywhere in the manuscript) or words occurring twice or more in a row (since repetition is often discussed). A slight modification of the wording might be enough to clarify this. |