The Voynich Ninja - The 'Chinese' Theory: For and Against

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

(17-06-2025, 03:03 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.After commenting out the subsection titles on both files, I counted again the number of words and parags, and basic statistics (min, max, average, and standard deviation) of the number of words per paragraph (nwp):

Code:
statistic ! bencao ! starps -----------+---------+-------- parags | 354 | 330 words | 10874 | 10457 min nwp | 7 | 11 max nwp | 76 | 72 avg nwp | 30.8 | 31.7 dev nwp | 8.5 | 11.2

Considering the missing bifolio in the SPS quire, we have 6 surprising near coincidences: number of entries, and the mode, min, max, average, and deviation of the number of words per paragraph. (The total number of words is not an extra coincidence since it is the average npw times the number of entries.)

With some numbers this looks more interesting to me. However, I don't see this as 6 near coincidences. The total number of entries is the obvious optimization parameter. You wouldn't even consider comparing Voynich stars section with a manuscript of 30 entries or a tome of 3000 folios. If I understand it correctly, the similar number of entries was one of the things that attracted your attention to the Chinese MS. If VMS had 30 entries, there would be another Chinese (or Arabic or Hindi) piece of interest with 30 entries and maybe a different origin story.

Min nwp is 7 vs 11, I wouldn't say this is remarkably close. In a list of arbitrary variable length natural language records it's expected that the shortest string lengths will be close to one another in absolute terms, because there is a limit on how short a record can get, there are no records shorter than 1 word. Also, you decided to remove section titles from bencao, if I understand it correctly. Were they a later addition and not part of the original work?

Max nwp is more interesting, it is close. The average, the max and the mode being close probably just tell us that both distributions are somewhat bell-shaped and symmetrical. I'm not sure this is a surprising feature, I'd say this is expected for any list of natural language records.

So, in my opinion there is one true near coincidence - the max number of words per paragraph, 76 vs 72.

I'd attribute this to pure convenience. If you are writing a long list of 300+ entries, you probably will try to keep them reasonably concise.

Also, if we look at the specific points that create the right hand side of the curve, from where these two numbers originate, we can see that there is very little similarity in the distribution. If we removed just a single record from the red set, the numbers would become 72 vs 65. So, this near coincidence can be just that: a coincidence.

[attachment=10832]

BTW, I'm not sure what method was used to determine the number of words for the Chinese manuscript.

(17-06-2025, 08:35 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.the word structure (quite unlike that of "European" languages, but just like that of the "Chinese" languages);
What do you mean exactly by "the word structure"? In all flavours of Chinese a word /correction/ syllable consists of an initial sound, a vowel and a final, as far as I know.

Actually the initial "consonant" can be up to two phonemes, like pinyin "q" = [You are not allowed to view links. Register or Login to view.], "zh" = [You are not allowed to view links. Register or Login to view.], "z" = [You are not allowed to view links. Register or Login to view.], "c" = [tsʰ]. Then there may be up to three vowels (as in 帅 = pinyin "shuài") and a final (pinyin "n" or "ng"). There is also a weak syllable "ér" that seems to be a suffix of some sort that I think I have seen described a part of the previous syllable. And then there is the tone.

Other languages may be more complicated, with more clusters at the beginning an consonants at the end.

The key point is that the "Generalied Chinese" words, being either single syllables or compounds of two such words, have a fixed sequence of slots, where each slot can be filled with a small and different set of phonemes.

Quote:In Voynichese the structures of words /correction/ or parts of words are much more complex than that, and you certainly are aware of this, because there are very well known "Stolfi's" models of decomposing Voynichese words, that do not conform to a simple clean prefix-infix-suffix model.

I suppose I made it seem more complex than it actually is. Basically, the "normal" Voynichese word too has a fixed number of slots, and each slot can be filled with a small and specific set of glyphs.

Unfortunately, there is a large number of schemes that the Author may have chosen to map the structure of a "Chinese" language to the structure of a VMS word.  Even for a single language, at a single stage of evolution. The Author may have decided to use a singe glyph to encode the [You are not allowed to view links. Register or Login to view.] of Mandarin, as pinyin does. Or, conversely, he may have used two glyphs for a single phoneme, like the English use of "sh".

And then there is the tone. In Mandarin, is not a property of any specific vowel, but of the syllable as a whole. Therefore, if it is indicated by an explicit glyph in the VMS, this glyph may be inserted anywhere in the word,  with the position changing randomly between occurrences of the same word. Or in some more complicated way.Check You are not allowed to view links. Register or Login to view. to see how insanely complicated the scheme could be.

Pinyin (PY) and the national Vietnamese script have settled to use diacritics over a specific vowel to indicate the tone. Another scheme that was used for Mandarin, in pre-Unicode days, was a digit placed at the end. Tibetan used a silent consonant prefixed to the syllable.

Another scheme, used for languages with many tones, or to compare tones in different languages, is to use digits 1-5 to denote pitch, and a sequence of digits to denote the pitch profile of the tone. Thus, for example, the flat tone of Madarin in PY jīng could be written as jing3, the ascending tone of míng as ming25, the dipping tone of shǒu as shou213, etc. And these digits could be inserted anywhere in the word, e.g. sh2o1u3.

Quote:I'd say if Voynichese was a phonetic representation of Chinese of any kind, this would be very obvious.

Well, it now looks obvious to me...

Quote:
(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.the lack of identifiable articles, copula verbs, inflections
There is lack of identifiable anything in the Voynich MS. Statistically it's not similar to any language, including Chinese.

Articles in Romance and Germanic languages, as well as in Greek, Arabic and (I suppose) Hebrew, are easy to spot under any encoding scheme that maps every occurrence of each lexeme (entry of the vocabulary) to the same encoded string. Because they

are a small set of very commonand hence very short lexemes, usually the most common
regularly occur in front of certain lexemes (nouns, adjectives) and never in front of others (verbs)
each is paired with a specific set of nouns (gender/number agreement)

People have looked hard for article-like words in Voynichese, and found none. But the "Generalized Chinese" languages don't have articles either.

A distinctive feature of Indo-European languages is that the endings of of nouns, adjectives, and verbs are changed to indicate grammatical gender and number, with agreement between the three when used in a sentence. People have also looked hard but in vain for similar features in Voynichese, Guess what, the "Generalized Chinese" languages do not have inflections, gender, and numer either.

The earliest missionaries who studied Chinese would say that "Chinese has no grammar". I understand that it does, but it is quite subtle. I doubt that someone could deduce it just by analyzing a pinyin text, without a dictionary and a grammar tutorial.

Quote:
(16-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view. the relatively large number of duplicated words, like chor chor
Duplicated words are not a feature of Classical Chinese, as far as I know. In the manuscript that you mentioned as the possible source I've only found 43 instances of duplicated character tokens. ... After removing punctuation, I've found 15 instances of repeated words in Opus Magus

That version of the SBJ has ~11000 words (a fraction of them being two-syllable compounds). How big was your Latin sample?

Quote:[The Zodiac diagrams] are the only pieces of evidence for the Chinese theory so far that I personally would call specific. Is there a long form explanation of why these point to the Orient?

See You are not allowed to view links. Register or Login to view. for example.

Quote: why was there at all the need to put European Zodiac signs onto the Chinese Zodiac?

Because the Author must have been told the time of the year covered by each diagram, understood that the 24 diagrams roughly corresponded to the 12 sign of Western Zodiac, so he indicated that information in his copy.

Quote:We don't actually know the number of entries in the [Starred Parags section] of the Voynich MS.

As I wrote before, we can estimate how many entries there were in the missing folios by the average number of entries in each page of the section. The total should be between 330 and 390, depending on whether the missing folios have no recipes or all have the average number of recipes per page. See my other post earlier today.

(17-06-2025, 11:22 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.There was also Moses, who was pulled out of the Nile.

Yep! But sorry, the You are not allowed to view links. Register or Login to view. was by Paolo Veronese. Raphael has another version of the same scene, but it is quite different.

(17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.BTW, I'm not sure what method was used to determine the number of words for the Chinese manuscript.

I asked DeepSeek, presumably more knowledgeable about Classical Chinese than Google Translate, if words can be reliably identified.

DeepSeek Wrote:While word identification in space-less ancient Chinese requires expertise, it is reliable when guided by grammar, context, and philological training. Modern computational tools (e.g., NLP for Classical Chinese) also assist, but human judgment remains essential for ambiguity.

Google Translate on the left, DeepSeek on the right:

[attachment=10833]

(17-06-2025, 02:27 PM)Pepper Wrote: You are not allowed to view links. Register or Login to view.
(17-06-2025, 08:35 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There are generally two kinds of Voynich theories: the solution kind (providing some specific plaintext for specific parts of the MS, be it labels, lines, etc) and the origin story kind, of which your Chinese theory is an example.
I think the origin story is not at all convincing but that's largely irrelevant to whether the solution is correct or not, so it's a shame to get bogged down in arguments about it.

Abstractly that may be true, but in practice any attempt to decipher the VMS must make some assumption about its origin and how it was produced. That is necessary to limit the possibilities for the language and encoding, to estimate the fraction of errors, and to exclude spurious features from analysis.

In fact, most attempts at decipherment to date have made the same assumption about the origin: the manuscript was created in Europe, and the text and diagrams (not just the script) were original creations by the Author, and either they were a nonsensical hoax, or their meaning was perfectly known to the author. In the second case, every word and every detail of the drawings was intentional; and therefore could be a clue for the decipherment, or had to be explained by it.

And I believe that those attempts failed, and were doomed to fail, because that assumption is false. The "Chinese Theory", in contrast, provides an entirely different set of candidate languages and a very different type of "encryption"; and it implies that, while the text and diagrams had meaning, the Author himself had only a limited understanding of them. Thus he must have made many errors, and (in the Herbal especially) made up a lot of stuff that he had failed to record. And therefore it demands very different approaches to decipherment.

(17-06-2025, 06:44 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Unfortunately, there is a large number of schemes that the Author may have chosen to map the structure of a "Chinese" language to the structure of a VMS word. Even for a single language, at a single stage of evolution. The Author may have decided to use a singe glyph to encode the [You are not allowed to view links. Register or Login to view.] of Mandarin, as pinyin does. Or, conversely, he may have used two glyphs for a single phoneme, like the English use of "sh".

And then there is the tone. In Mandarin, is not a property of any specific vowel, but of the syllable as a whole. Therefore, if it is indicated by an explicit glyph in the VMS, this glyph may be inserted anywhere in the word, with the position changing randomly between occurrences of the same word. Or in some more complicated way.Check You are not allowed to view links. Register or Login to view. to see how insanely complicated the scheme could be.

Pinyin (PY) and the national Vietnamese script have settled to use diacritics over a specific vowel to indicate the tone. Another scheme that was used for Mandarin, in pre-Unicode days, was a digit placed at the end. Tibetan used a silent consonant prefixed to the syllable.

Another scheme, used for languages with many tones, or to compare tones in different languages, is to use digits 1-5 to denote pitch, and a sequence of digits to denote the pitch profile of the tone. Thus, for example, the flat tone of Madarin in PY jīng could be written as jing3, the ascending tone of míng as ming25, the dipping tone of shǒu as shou213, etc. And these digits could be inserted anywhere in the word, e.g. sh2o1u3.

This sounds plausible to me. Indeed while pinyin is quite efficient and well thought through, one can't assume a phonetic system invented by a person without deep knowledge of Chinese and at a time when there was no existing phonetic alphabet for Chinese, would be as well structured. So, in principle a script like Voynichese with all its strange properties could be designed to represent sounds of a language similar to Chinese. I still don't think there is good enough evidence that Voynichese was actually designed for this purpose though.

(17-06-2025, 07:35 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(17-06-2025, 06:28 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.BTW, I'm not sure what method was used to determine the number of words for the Chinese manuscript.

I asked DeepSeek, presumably more knowledgeable about Classical Chinese than Google Translate, if words can be reliably identified.

I'm not sure this applies to Classical Chinese. As far as I remember, in Classical Chinese usually one word was represented by one character. Quoting You are not allowed to view links. Register or Login to view.: "Almost all lexemes in Classical Chinese are individual characters one spoken syllable in length. This contrasts with modern Chinese dialects where two-syllable words are extremely common." So the task is mostly trivial.

My question was not about how to do this, but what specific method was used by Prof. Stolfi.

(17-06-2025, 09:27 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.My question was not about how to do this, but what specific method was used by Prof. Stolfi.

In You are not allowed to view links. Register or Login to view.:
# Transcription of Chinese characters to pinyin obtained by Google Translate.

(17-06-2025, 09:37 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(17-06-2025, 09:27 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.My question was not about how to do this, but what specific method was used by Prof. Stolfi.

In bencaopinyin.txt:
# Transcription of Chinese characters to pinyin obtained by Google Translate.

So, the Chinese text was transcribed to pinyin and then space separation of pinyin groups was treated as word breaks?

(17-06-2025, 04:57 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Still it's interesting that the vowels a,e,o,u in Giovanni Fontana's cipher (1420 ca) were rotations of the same simple shape. Example from You are not allowed to view links. Register or Login to view. cesto da uoue=egg basket

Indeed. And the same idea was adopted by the creators of the You are not allowed to view links. Register or Login to view. .

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61