I started looking into what caused the differences between the old and new graphs. First of all, the raw TTR values are exactly the same, so this is fine.
The reference corpus did change in two ways. One, many Greek texts were removed because they were simply overrepresented. Second, at some point I shared the corpus with Marco and he helped me clean it up and sort out some issues. The bottom line is that the new reference corpus is better, more representative and contains less problematic formatting.
Let's first look at Q13. These are the values for windows 2, 3, 4, 5:
0.9943804035 0.9871259067 0.9803978092 0.9734755658
Normalized to the old (problematic) corpus, this gave the values:
-3.544204936 -3.06939543 -2.630757497 -2.240265703
And in the new corpus:
-8.378444309 -5.083947024 -3.252307989 -2.454021631
Here I'll plot old normalized Q13 and new normalized Q13 on a graph:
[
attachment=7183]
Now what we see here is clearly an effect of the different normalization. The overall shape is the same, but clearly stdev is greater in the new "small windows", hence a greater difference with the norm. So it looks like after having cleaned up the corpus of unwanted layout and punctuation artifacts, the strange behavior of Voynichese in small windows is even more obvious.
There must have been a mistake in my earlier graphs though, since the values used there are sometimes different from the dataset I was using. I have no idea what went wrong there, it was several years ago. Either way, thank you for pointing this out.
So with the new (and hopefully correct) data, this is what I get for various Voynich sections:
[
attachment=7184]
I realize this graph is impossible to read, but the point is that the logarithmic shape is universal for Voynichese samples. And this makes sense: it behaves fairly normally overall, but does weird things (reduplication patterns) in small windows. Hence the low start that gets corrected over longer windows.
It also makes sense that the shuffled versions look similar, though less extreme. As Marco said, shuffling a normal text will create more instances of reduplication of frequent words (the the, this this...). So we also expect a lower TTR in small windows.