When I wrote:
Quote:The interesting part is not that the main text follows Zipf law, but that the main text does while the labels do not.
This means that not the same process was followed for generating or writing the main text and for the labels.
I thought it was obvious, but a bit of explanation seems to be a good idea.
I use the term 'process' in a very general way.
The process involves a person, pen in hand, who writes one character after the other on the parchment. The resulting text is the output of the process.
One type of process could be: writing a running text in some language, with implied rules about grammar and syntax.
Another process could be: adding single words to illustrations indicating what the illustration is about.
Many of the properties of the two outputs will be different, even though the words in the second process are likely to occur in the the output of the first process.
A third process could be: moving a Cardan grille over a large table and copying the resulting character sequences to the parchment.
A fourth: the auto-copying hypothesis of Torsten Timm.
In the 'optimistic' scenario that that the Voynich MS contains a meaningful text that is just waiting to be retrieved, the first two processes could be the basis for the main text and the labels.
The fact that the observed differences exist do not prove that the text is meaningful, but at least it is compatible with that, and in my opinion it is a sign of planning and of non-arbitrariness.
In the case of the Cardan grille and the auto-copying hypothesis, it is not immediately clear why the Zipf law would be obeyed, but it is conceivable. However, what is not explained is that it appears in the main text but not in the labels.
It would require a dedicated effort by the author to 'do something different'.
One can safely exclude the possibility that the author understood the Zipf law and deliberately broke it for the labels.
Another interesting property of the label words (and I concentrate on the zodiac and pharma labels - I have barely looked at the others) is that they do largely occur in the main text (thanks to Marco for confirming this), but do not include some of the most frequent words. No label (from memory) just says
chol ,
daiin or
chedy. This is also a 'good sign' for the meaningful text scenario. The running text is likely to include words like articles, prepositions and verbs that are less likely candidates as labels.
Just as a historical footnote, the solution proposed by John Stojko works by ignoring all spaces in the manuscript, assigning consonants to all symbols, and inserting vowels and re-introducing spaces. The resulting text is proposed to be (old) Ukrainian. This solutions runs over the plain text, but also over the concatenated labels, without any distinction. This implies one and the same process for the running text and the labels, which is not compatible with what is observed. (The much bigger problem is that it does not explain the word structure of Voynichese).