14-05-2026, 02:17 PM
(14-05-2026, 10:06 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Here it is. I repeated @dashstofsk computation as described in the original post. This is what I would expect from a natural language - a lot of underrepresented combinations (and a few hugely overrepresented). Nothing like the Voynich MS chart for which the upper left corner mostly consists of numbers close to one.
@dashtofsk concluded that it was an artificial language because his table looked totally unlike what one would get from an "European" language (including Hebrew, Arabic, Turkish, etc.).
In fact, it is not even clear how one could build such a table for those languages, because there is no obvious and manageable way to split words into prefix-core-suffix. Thus the striking difference that @dashtofsk saw is mainly due to the peculiar structure of the VMS words, with a small number of "slots" each its own set of alternatives. A structure which has been modeled in many ways by many people.
But Mandarin and other monosyllabic languages too have a broadly similar "slot" structure, and thus admit natural prefix-core-suffix decompositions. (The one I suggested is arbitrary, there are many other possibilities.)
And, comparing your table for pinyin and @dashtofsk table for the VMS, the similarities are much more striking than the differences. Note that in the VMS table there are significant deviations from unity even in the part where the sampling error should be small -- say, the first three rows and the first seven columns. There, we see 1.31 for qokedy and a 0.24 for ky.
In fact, one could argue that those discrepancies are evidence against his thesis, because they show that the choice of suffix is not independent of the choice of prefix.
And that was expected, even without considering frequencies. For example, according to You are not allowed to view links. Register or Login to view., the strings that can follow a gallows letter depend on what came before it. Specifically, a word can have at most two of the elements X = {{ch} {sh} {ee} {che} {she} {eee}} and at most three of Q ∪ D ∪ N = ({q} {d} {l} {r} {s} {n} {in} {iin} {iiin}). Thus, for instance, if @dashtofsk's prefix has an X element, the suffix can have at most one X.
And Mandarin pinyin has similar constraints too. Ignoring the tones, there is a limited repertoire of vowel combinations: all single vowels, many vowel pairs, but only a few vowel triples. Thus, in any prefix-core-suffix segmentation, if the prefix and core have two vowels, the suffix will almost always have none.
But it is true that the deviations from unity in your pinyin table are more dramatic than those in @dashtofsk's VMS table. There are several possible explanations for that, that still allow Voynichese to be a natural language:
- In the written Voynichese language, the prefix and suffix of each syllable happen to be more independent than they are in Mandarin
- "The accented vowel" was a bad choice for the core of a Mandarin pinyin syllable.
- The VMS text has many more errors than the Mandarin pinyin one. We know that there are transcription errors, because many glyphs are hard to identify and transcribers often pick one possible reading at random. Transcribers also disagree on word spaces, so what is ykaral to one maybe y karal or ykar al to others. And some transcribers will, consciously or unconsciously, base their decisions on what they came to view as "valid" prefixes and suffixes. And then there is an unknown amount of spelling and spacing errors made by the BEEEPers, the Scribe, and the Author himself. All these errors will make the prefixes seem more independent of the suffixes than what they would be in correct Voynichese.
- IIUC, the @dashtofsk tables were computed over all language B pages, mixing text from different sections. Thus the lexicon of his text was quite varied. The pinyin file I provided is (as you may have guessed) the Shennong Bencaojing (SBJ), a collection of 365 "recipes" with a rather limited vocabulary (~630 distinct words) and a specific format. There is a small set of "keywords" (like "wèi" = "flavor", "zhǔ" = "mainly for", "yī míng" = "another name", "shēng" ="provenance", etc.) that occur in practically every recipe. These features will make the distribution of prefix x suffix pairs much "lumpier" than it would be in a more varied text. In other words, even though the pinyin text is fairly long, there is substantial sampling error at the lexicon level. It would be interesting to see @dastofsk tables computed over You are not allowed to view links. Register or Login to view. alone (which, as you know, I am claiming to be a transcription of the SBJ in some unidentified language).
All the best, --stolfi
