16-05-2026, 10:12 PM
(16-05-2026, 05:22 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Here's side by side the computation run on an English text and on Voyniches with spaces and with all spaces removed before processing. As you can see, for the most frequent 5 token combinations there is no much difference. English shows a lot of unbalanced prefix-suffix pairs (mostly just missing from the text), Voynichese only shows a few.
The difference between Voynichese and other languages seems to be a matter of degree, not a fundamental one.
The visual comparison is misleading because the color scale is such that the only info we see is whether a cell is zero (dark red) or nonzero (some light pastel color). Thus it is enough for *one* daiin chedy to occur for it to seem like "the occurence of chedy is independent of whether the previous word was daiin or not". But in fact the "almost white" Voynichese cells have ratios higher than 2 or as low as 0.5.
In my previous post I gave several possible explanations for why Voynichese would seem to be more "Cartesian" than other languages, while still being a natural language. They included the unknown incidence of spelling errors in the VMS text. You did a test with artificially misspelled English, and it did make English look more Cartesian, correct? Perhaps the amount of errors in Voynichese is more than what you assumed?
Another variant of that "spelling errors" explanation is that the Author's spelling of the language could be mapping many different words to the same string of glyphs. I am still unable to distinguish spoken English "and" from "end", "man" from "men", etc.; if I were to take a dictation of a string of unfamiliar words, I would write both vowels as "e". If the Author was an Englishman or German writing French, he might have omitted all diacritics for them being "just another case of French silly nonsense". If he was an Arab merchant writing Finnish, he might have omitted all short vowels, because surely anyone who speaks Finnish can guess them from the context, no? And so on...
And if the Author was taking dictation, he probably would use some form of shorthand, not the language's standard spelling. And shorthand systems typically will use the same sign to represent multiple similar sounds, like "p" and "b", "è" and "à", etc.
Either way, if Voynichese "daiin" and "chedy" each represent several different words of the language, the pair "daiin chedy" may be common because "bright star" is common, while "daiin shedy" may be common because "water pipe" is common -- even though the document never uses "bright pipe".
And yet another possible explanation is that the VMS may be a terse style of prose which omits most function words and violates syntax for the sake of brevity. That is, instead of
- "Goblin's Carrot is a tall bush that grows on the mountains. Its bark, made into a tea, will cure baldness and increase one's chance of picking up girls at the tavern"
it may say
- "Goblin carrot: tall bush; mountain. Bark tea: bald head, girls"
except that without any punctuation, of course. Then one will see many pairs like "bush mountain" and "tea bald" that would not occur in a normal grammatical text.
And this possibility brings up another point: when comparing Voynichese with other languages, one must use texts that are hopefully in the same style. That is, the Herbal section should be compared to herbals, the Starred Parags section should be compared to a book of recipes (or whatever one guesses that it may be), etc. And even then there is a huge range of styles...
All the best, --stolfi