I guess that one of the problems with statistical analyses of the VMS is that, when comparing with other sources, one typically only has modern texts available.
My impression is that some of the strange features of Voynichese might be caused by the script, rather than by the language.
For instance, there are medieval European scripts in which the same character is written differently on the basis of the nearby characters. I expect this could result in lower entropy (but it's clear to me that this phenomenon should be very extensive to result in second-order entropy comparable with the VMS).
This is an example of a script in which 'r' has three different shapes:
* similar to uppercase R (but smaller) at the beginning of words [red]
* similar to '2' or 'z' when midword and immediately following a "round" character ('o', 'p', 'd') [green]
* 'r' in other cases [blue]
Obviously, to an hypothetical transcriber having no knowledge of Latin languages and alphabet, these three would look like different characters and each would be transcribed as such. He would have to deal with a character that only occurs at the beginning of words and another one that only occurs in the presence of a restricted left context.
UPenn You are not allowed to view links.
Register or
Login to view. - Virgil - [Le livre des Eneydes] - France, late XV Century
Other interesting features in this manuscript are that 'v' only occurs word-initial (in all other cases, the same character as 'u' is used) and 's' has two different shapes (this is actually quite frequent), one only occurs at the end of words, the other elsewhere.
The example of 'v' is a simple case of ambiguity: a single symbol sometimes used for unrelated sounds. This same manuscript typically omits the dot upon 'i', with the result that 'm' and 'ni'/'in' are often indistinguishable. Of course, something similar might be happening with VMS EVA:i and EVA:e sequences (see also this You are not allowed to view links.
Register or
Login to view., by Stephen Bax).