The Voynich Ninja
VM TTR values - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: VM TTR values (/thread-2818.html)

Pages: 1 2 3 4 5 6 7 8


VM TTR values - Koen G - 14-06-2019

Since the main discussion is going on in the off-topic section, I thought it might be worthwhile to make a separate thread for discussion of the VM values.
Once again I'd like to thank everyone who helped me out, many interventions were required for me to get to this point. I would be so hopeless without you guys.
I redid everything using Rene's ZL2a file (more spaces) as recommended by Nablator. From this I isolated five sections:

  1. Herbal A 
  2. Herbal B
  3. Q13
  4. Small plants (with labels removed, so only the paragraphs)
  5. Q20
I then used Nablator's code to calculate the following MATTR window sizes: 2, 3, 4, 5, 10, 20, 50, 100, 500.
Using my corpus of 471 texts, I calculated mean and stdev for each window. This allowed me to normalize the values.
The result is as follows:

   

As you can see, both B-sections (orange and red) behave similarly. 
Herbal A and the paragraphs from the small plants behave very similarly as well. Is the latter entirely A?

And Q13 has overall much lower values. The way I read the graph, this appears to be the result of larger distance repetitions. Below 5, it starts to approach the other sections again.


RE: VM TTR values - nablator - 14-06-2019

(14-06-2019, 12:45 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Herbal A and the paragraphs from the small plants behave very similarly as well. Is the latter entirely A?
Yes.

Is there a text that comes close to any of these curves in your corpus?


RE: VM TTR values - Koen G - 14-06-2019

(14-06-2019, 01:21 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Yes.

Is there a text that comes close to any of these curves in your corpus?

That's pretty cool, so I'm predicting Currier language rather than apparent subject matter. 

As for the corpus, you can see it You are not allowed to view links. Register or Login to view., I'm not sure if there is a really similar one. Normalized values are on the second sheet.
Just to get an idea of the scale I added Pliny, Matthioli Dioscorides and Balneis to the graph. They bundle nicely on top:

   

Edit: forgot link


RE: VM TTR values - MarcoP - 14-06-2019

(14-06-2019, 12:45 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.And Q13 has overall much lower values. The way I read the graph, this appears to be the result of larger distance repetitions. Below 5, it starts to approach the other sections again.

Thank you for sharing your results, Koen!

From your second graph (where you added some Latin texts) it seems that all curves tend to converge towards zero for the smallest windows. Since Q13 is consistently lower than the others, I don't think the effect can be due to long distance repetitions, since they would have no effect on smaller windows. Maybe the reverse could be true? I think that frequent immediate reduplication will affect all points of the curve....


RE: VM TTR values - VViews - 14-06-2019

Hi Koen G and thanks for these graphs: very interesting!
I am curious about Q13: would it make a difference if you separated the two subquires? I mean central pools versus marginal drawings folios. I wonder if one of them would be more aligned with the rest of the curves than the other.


RE: VM TTR values - ReneZ - 14-06-2019

@VViews,

if you look at the plot in You are not allowed to view links. Register or Login to view. ,  you can see that all Bio pages are more or less similar.
However, this is not the case for the stars/recipes section, where the outside and innermost bifolios are more like quire 13, and the other are more like the rest of the MS.

That last point was already observed on You are not allowed to view links. Register or Login to view. .


RE: VM TTR values - Koen G - 14-06-2019

VViews: indeed, Rene's graph shows this well. The entirety of Q13 is a significant drop towards the rest of the MS (overall).

Marco: I see what you mean, this is an interesting question. I'd like to test this. I'm thinking the following: what if close duplicates are removed from the Q13 text until its m5 value reaches that of Herbal A. And see whether the higher window values align?


RE: VM TTR values - MarcoP - 14-06-2019

(14-06-2019, 03:10 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Marco: I see what you mean, this is an interesting question. I'd like to test this. I'm thinking the following: what if close duplicates are removed from the Q13 text until its m5 value reaches that of Herbal A. And see whether the higher window values align?

This seems reasonable to me. The approach is certainly very empirical, but I have no better idea at the moment.


RE: VM TTR values - Koen G - 14-06-2019

Rene, I added the label file you provided. I left Pliny for reference, and as you see the labels soar right over him - they are very varied. Is this what you were expecting?

Marco: I got some interesting results which seem to confirm my suspicion - though all of this is very speculative.

What I did was the following: in Notepad++ I selected a common word, which highlights it across the text. I then went on to remove that word wherever it appeared more than once in a 5-word window (so even removal throughout the text). I repeated this step until m5 was the same as the m5 for Herbal A.

So in short, I took shots at the m5 value (or lower) until it was high enough. This resulted in the removal of 87 out of 6941 words, or about 1%. So I removed 87 duplicate words that appeared within 5 or fewer word blocks.

As the bottom pink arrow I added indicates, however, this has a surprisingly small effect on the m500 value. I was not expecting the original Q13 and the altered one to be so close together. So even if I artificially kick up the m5 value, m500 remains really low.

   


RE: VM TTR values - ReneZ - 15-06-2019

Koen, the line for the labels is way above all the others, showing that the ratio of repeated words is much lower.
If they are really individual words, and not part of a running text, this line should be above essentially all texts that you have.

For the experiment of removing repeated words from the Q13 text, it appears that this does not significantly change the higher M-values, showing that long-range correlations do play a role in these statistics.