Here are some observations after a quick first read of the paper. I wrote this before reading Stephen's comments and then I edited my post: there may still be some repetition of things that Stephen already pointed out.
The paper largely is a survey of the best literature about Voynichese. It is quite extensive and can in part be seen as an update to Reddy and Knight "What we know about the Voynich manuscript" (2011).
The authors address all possible angles of a linguistic interpretation of Voynichese. I agree with Stephen that their treatment of "meaningless hoax" ideas is more superficial.
When presenting various proposed solutions, the authors note that "When [solvers] discuss the data, they focus almost entirely on the lexicon, ignoring morphology and syntax". This is an excellent remark: the process we see over and over is
VMS to word-salad to "meaningful" text; but languages are not word-salads and an actual translation requires more than a dictionary.
So Bowern and Lindemann give much space to discussion of structure in the ms: this is done with ample reference to researchers like Stolfi, Guy** and Gheuens, whose work with MATTR receives great attention (bravo Koen! and bravo Nablator, who wrote the software!).
Another excellent detail of this work is that they compared abbreviated and full versions of a Latin text (Secreta Secretorum) finding that entropy is higher in the abbreviated text.
This passage about the low entropy lists a number of interesting ideas:
Quote:The entropy of Voynichese is unlike any other language or script. Plausible manipulations of the script were investigated, including various shorthand abbreviations and devoweling the script. These do affect the character entropy, but not to the extent that would be required to bring Voynichese to the level of other languages. The only manipulation of this type that brings the conditional entropy to Voynich levels is systematic conflation of phonemic distinctions, such as conflating all vowels to a single character, recoding based on dividing characters into whether they occur in the first or second half of the alphabet, or sorting all characters in the word into alphabetical order.
Another small bit that happens to overlap with my (obvioulsy non original) opinions is that EVA:m and EVA:g are line-final variants of endings that are written differently in the text (the authors propose '-iin' and '-y' respectively). I think of these characters as 'abbreviations' but the views of the authors are slightly different.
Bowern & Lindemann appear to be convinced that
Pelling's axiom is indeed valid: each Voynichese word largely corresponds to a word in the underlying language.
I am happy to also see a quantitative discussion of reduplication (with measures that are quite close to the 1% we often discussed on the forum):
Quote:Full reduplication, in which the entire word is repeated, is also common in Voynich. However, it is still within the realm of plausibility for natural language texts. In Voynich A each word has a 0.84% chance of repeating while in Voynich B that chance is 0.94%. The range among the samples in our language corpus is 0.02%-4.8%, with an average of 0.63%
I find this result interesting and I am looking forward to look into the corpus used to examine the details. I had missed the footnote mentioned by Stephen and of course also the repetition in human attempts at meaningless text is of the greatest interest. I hope all the benchmark texts are available for download but I did not check yet.
I am not sure I fully understand their explanation for quasi-reduplication (which here is simply called "repetitiveness").
Quote:This repetitiveness is at least partly the result of the relatively limited set of character
combinations and the predictable structure of Voynich words.
About the "not structure-preserving script" mentioned in the final Summary: I think that an option that fits this description is Rene's mod2 cipher system (which assumes a nomenclator). I am not sure that a verbose-cipher would be enough to produce the rigid word-structure of Voynichese, but it seems that the authors are going to further investigate these ideas, so hopefully we will read more in the future.
Some passages (e.g. "4.2 Phrases") seem to suggest that an artificial language is also being seriously considered, but they seem to be thinking of a Latin-based language, rather than an a-priori language as proposed by Friedman.
Of course there also are statements that do not match my own opinions, but I no longer believe everything I think, so they don't seem to be worth mentioning at the moment.
** Apparently, there is a typo in the discussion of Guy's application of Sukhotin's algorithm: EVA:y
y ('g' in Guy's transliteration) was identified as a vowel but is missing from the list in the pre-print.
Another note:
Quote:[Some characters] closely resemble numbers: cf. q, d, y.
l should be added to the list (it was a frequent variant shape of 4).