Hello again!
I've taken a break from studying the Voynich for a few months. It's such a complex subject that I think you need to mentally disconnect from it every now and then.
Lately, I've been doing a simple statistical test on the manuscript, which in principle doesn't depend on any linguistic interpretation. The idea is to measure to what extent the identity of a character at position t gives us information about the character at position t+d. In more technical terms, I'm measuring mutual information, which can be calculated for different distances d. If the text has only very local structure, this dependence, however small, should quickly disappear. If there is a deeper structure, some of the dependence should persist even at larger distances.
In the case of the Voynich, mutual information is maintained at a certain level even at distances of 50 to 100 characters (very similar to natural languages). When the same text is globally shuffled, the signal collapses. This seems to confirm that the effect depends on the actual order of the characters and not just their frequencies.
I have also tried a control that preserves local patterns but destroys the global order by shuffling entire lines. In this case, the short-range dependence is maintained, but the behavior at longer distances is lost. This suggests that the signal is not limited to regularities within each line.
To make sure that the result is not just due to the fact that the manuscript has different parts with different letter styles or frequencies, I did a very simple test. I created artificial texts divided into blocks. In each block, the letters appear in the same proportions as in the original text of that part, but they are placed randomly, with no real order.
So the artificial text preserves the slow changes in frequencies between sections, but removes any real structure in the sequence. When I apply the same measurement to these artificial texts, the signal disappears almost completely. This means that the pattern we see in the Voynich cannot be explained simply by the fact that different parts of the manuscript have different letter frequencies. There is more than just variation between sections.
I also trained simple generative models on the Voynich text itself. A 1st-order Markov model captures local transitions but fails to reproduce the structure over longer distances. Moderate-order character n-gram models reproduce short-range effects, but they do not match the persistence observed in the original text.
Importantly, the pattern is robust whether spaces are removed or if one changes from EVA to an alternative transliteration (CUVA). The overall behavior remains qualitatively the same.
For comparison, I have applied the same analysis to several natural language corpora. The Voynich curves fall within the same general range as those of these texts: they do not behave like mixed noise or like sequences generated by simple local models. On a purely statistical level, the Voynich character sequence shows a structured long-range dependence comparable to that of natural texts.
This does not prove that the manuscript encodes a natural language. But I do think it shows that its character sequence behaves like a structured system with persistent long-range dependencies, and not like a mixed or purely local construct.