The Voynich Ninja

Full Version: The 'Chinese' Theory: For and Against
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
(12-06-2026, 04:35 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Are all of these sections later additions or would they have been present in the SBJ when this dictation took place?

Those that I quoted were all part of the text marked as "original SBJ" in the Zhenghe Bencao (by large white-on-black characters).  So were also two other comments 神物 "[amber] is a divine substance" and 鸡白蠹能肥脂.   

The meaning of this last comment has Chinese scholars themselves baffled.  Literally it translates as "chicken white grub can fatten fat".  According to The Net, one interpretation is that the first three hanzi refer to white beetle larva that grow in the manure and wood boards of chicken coops.  Another is that they refer to lumps of fat in the chcken's belly that resemble white grubs.  Either way, the other three hanzi may be saying that eating that stuff makes one put on weight.  

Either way, this presumed sub-entry of the recipe is anomalous because it lacks the 主 present in the other eight sub-entries. And, if the first interpretation holds, it would not be a part or product of the "red rooster", like the other eight.

These "non-medical" comments appear to have been later additions, but so old (say, before 200 CE) that by the time the ZHB was compiled (around 1080 CE) they were considered "original" too.  There may be other "pseudo-original" bits like these, but if so they seem to have been transcribed by the Author like the rest.

So I had to remove all those parts myself, before looking for the match.  In the rooster case these exclusions are easily justified because, even after deleting them, the "red rooster" is still the longest recipe in the SBJ, and it matches the longest single-star parag in the SPS (f105v.32) quite accurately.  And it does not come even close to matching any other.  

As I noted noted before, the biggest discrepancies between the two are in the name of the entry, before the first 主, which is 3 hanzi in the SBJ, but only 8 letters in the SPS (instead of the expected ~15); and in the gap between the 4th and 5th 主, which is 7 hanzi in the SBJ but 46 EVA letters in the SPS (instead of the expected ~35).   

The removal of that comment about roosters hung on the gate did not affect either of the two counts, since it is between the 2nd and 3rd 主.  The other two comments are at the end of the entry (after the last 主).

There is another bit of the entry that may or may not have been omitted by the Author: a note (女人) after the first 主, saying that the next few conditions that follow are specific to women.  Leaving those two hanzi out improves slightly the match, since the gap between the first two daiin (110 EVA letters) changes from being ~6 letters shorter than expected to being ~3 letters longer than expected.  However, this small improvement in such a long gap is not significant.  Anyway, that tag seems to be essential because the following conditions 崩中漏下赤白沃 would have very different meanings if they were assumed to refer to both men and women.  So it is uncertain whether the hanzi 女人 were transcribed or not.

All the best, --stolfi
I have a lot to say about your most recent response to me, but the summary is not very hard to state: Morpheme readings are constructed to be cognate, but they may not represent lexical correspondence in translation. Let's unpack that.

Let's just start with "rooster" because it turns out to illustrate about as much as I could hope. The Classical Chinese term is 雄雞. If we are being very careful---and to this point I have not been consistent about this---this is not one word, but two. The way Chinese readings are assigned to the words is by reading off their cognate terms. By construction, 雄 is cognate with the Mandarin word 雄(xiong2), so the reading would be "xiong2". But this is not what we find if we look up the words for You are not allowed to view links. Register or Login to view. in various languages*:
  • Classical: 雄雞
  • Standard: 公雞, 雄雞
  • Beijing: 公雞
  • Guangzhou: 雞公, 生雞
You can plainly see that readings in one will not necessarily give correct vocabulary in another. The overlap between Standard and Beijing Mandarin is down to the former being derived from the latter; the overlap between Standard and Classical is a direct borrowing read as a strict cognate, that is, a product of literacy.

Talking about syntax is a little more subtle. For starters, all the character pairs above have syntax, and it turns out in this example they all have the same word order, with the head final. Concretely, Beijing 公雞 and Guangzhou 雞公 have the same syntax here, but you can plainly see this does not guarantee that speakers of divergent languages will use the same word as the head and surface the same word order, even if they otherwise preserve the component words. If the Author spoke Cantonese, there would be two lost clues to meaning. A little bit of this is reasonable to cope with, but these are languages exactly because it is too much.

You've provided an analogy that shows the misapprehension here, I think.
(12-06-2026, 10:24 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.When you say "pronunciation" and "phonetic standard", I don't know whether you mean "accent" (like an Italian speaking French with a strong Italian accent) or "reading" (like an Italian reading aloud each printed French word as the equivalent Italian word: "il ne me plait pas de tout" --> "quello non mi piace niente di tutto").
I mean closer to the latter, but the Italian there is using the wrong kind of equivalent. If I've traced the etymologies correctly, the Italian cognate of "il" is "il", while "pas" corresponds to "passo". Even though it is not the preferred Italian, "Il non mi piace passo de tutto," is the way to do this kind of reading. There may be some details I have missed, but the fact that the "Italian" has the "ne...pas" construction despite not being grammatical is illustrative, not in error.

There are genuine syntactical changes, and you can avail yourself of the many online resources that exist to see the full scope of them, but I will give one more at the level of morphosyntax. Some words became bound in Middle Chinese and, subject to further language change in the later languages. One such word is 甘, which no longer appears outside of compounds as it did in the Shennong Bencao Jing. The problem for the supposed Author is that he would, at least implicitly, be expecting 甘 to be bound, but it appears in an unbound context in the Shennong Bencao Jing and so he would likely guess at a different word with the same reading if he heard it, or, alternatively, he might try and bind it to one of the surrounding words and misread them. The logographs help prevent this, but by hypothesis he could not read them.

There's an important consideration in the previous example. I realize you have edited the SBJ to remove 甘 because it did not yield a crib, but otherwise the formulaic way that 甘 appears and its technical meaning are the kind of thing I can imagine him learning, and the Dictator teaching. Because most of the words in the descendent languages were cognates with the Classical language it is very easy to go one by one and explain why a smart person could figure out the correspondence. Indeed, the whole orthography is built to leverage this fact. The issue becomes that when there are lot of these sorts of changes, the cognitive burden is quite high.

To that point, we can get in the ballpark of how many of these changes have accrued using translation, up to some caveats:
SBJ: 主治女子崩中漏下。
English: Mainly cures vaginal discharge in women. (My translation, consulting various translations and dictionaries.)
Standard: 主要用於治療女性陰道分泌物。(Google Translate from the English.)
I will acknowledge that this probably overstates the case to a degree as my translation is not optimized for the exercise, nor is the machine's. I can quickly spot that "女性" would be "女子" if Google Translate understood what I wanted, and there are probably some other places that make this look more unfavorable than it ought to. But if you zoom out, the effect here is well-known. Classical Chinese is comparatively terse, and it is very normal for Classical to expand like this when translated into Standard. The finer details are beyond my ken, which is why I asked the toaster to do it and all the caution that entails, but the fact that Classical Chinese is different from the modern languages is quite well known. 主 translating to 主要 and 治 expanding to 治療 to distinguish homonyms is the typical situation.

As for this:
(12-06-2026, 10:24 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Again, the scenario we are discussing (with the Dictator reading the white-on-black text from the ZHB or similar book in the local language,while the Author writes it down with his phonetic script) is only my best guess for how Starred Parags section (SPS) came to be a transcription or translation of SBJ, approximately one-hanzi-for-one-Voynichese-word. If you think that scenario is unlikely, what alternative do you propose instead?
I reject the premise of the question. I do not think the identification is sound and the statistical argument you've made is based on cribs you identified because there were qualitative similarities. It is no surprise to me that there were able to quantify that, but I don't think its a valid or sound identification. Even if I'm wrong, you're well aware of my position that the difficulties transcribing Classical Chinese like this and reading it later corroborate my criticisms of how you made the identification.

My summary is the answer to this question:
(12-06-2026, 10:24 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:It is sometimes hard to tell if they are glossing over the little hitch that Standard Chinese read this way is not mutually intelligible with Cantonese or genuinely in error
Are you sure?  Again, are you confusing "read with Cantonese words" and "read with Mandarin words in a Cantonese accent"?
I am quite sure! You cannot simply replace a passage of good Mandarin or good Classical Chinese with Cantonese cognates and call it Cantonese!



*It is quite unfortunate that my source terms these "dialectical synonyms". This is commonly seen, even in academic contexts, and I have curbed my impulse to be pedantic when other people have indulged it, but as they fail the mutual intelligibility test, they are properly called "languages". I would not be so firm about this but for the fact that calling them "dialects" has implications of Han nationalism, the invalidity of Cantonese and Hakka literature, and the correctness of efforts at "standardizing" spoken Chinese. In that vein, I would ask future "clarifications" that purport to quote me do not imply that I would have been clearer for supporting Beijing's efforts towards linguistic hegemony. As this is not mainly a political forum, I will let other people come to their own conclusions about how they want to talk about langauge diversity in authoritarian countries, but do not edit my quotes to support it.
(15-06-2026, 08:40 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.I have a lot to say about your most recent response to me, but the summary is not very hard to state: Morpheme readings are constructed to be cognate, but they may not represent lexical correspondence in translation. Let's unpack that.

Sorry, but I still don't understand your point -- and I believe that you don't understand the scenario that I am proposing.

The discussion about differences between the dialects is irrelevant, because scenario involves only one written language -- the text of SBJ as quoted and marked out in the ZHB -- and one spoken language -- the local language, that the Author learned, to some extent, during his stay.

The Dictator knows the local  language and can read the written one.  The Author can't read or write the written one.

The Author recruited the Dictator to read aloud the text using words of the local language.  It would be futile to read it using words from any other language.  The author is recording the dictation phonetically, instead of just buying a copy of the printed book, because he want to understand what he wrote, as much as his knowledge or the local language will allow, after he goes back to Europe.  Obviously that goal would be frustrated if the Dictator read it using words from some other language.

So there are two possibilities. (A) The dictation could have been literal, with the Dictator reading the each hanzi as a syllable of the local language, in the order that it is on the book.  Or (B) he could have dictated a translation into the local language; that is, he said what a local doctor would say to convey the same information to a local peasant, without reading from a book.  This second option is actually a spectrum, from merely replacing some compounds or reversing the order of a few words, to completely changing the word order, inserting glosses, using paraphrases, etc.

The evidence I have says that that what the Dictator did was either (A) or very close to it.  The spacings between the cribs are usually very precise, at ~5 EVA letters for each hanzi of the ZHB text.  That is, the text of the rooster recipe began

丹雄鸡味甘微温主女人崩中漏下...

(I am using simplified characters because the traditional ones are much harder for me to recognize. This does not make any difference, agreed? It is just a change of font.)

Let's pretend that the local dialect was modern Cantonese.  In the proposed scenario, what the Dictator would have said loud would have been  

daan1 hung4 gai1 mei6 gam1 mei4 wan1 zyu2 neoi5 jan4 bang1 zung1 lau6 haa6 ...

And these would be the sounds that the Author would write down in his phonetic script.

According to Google AI, most of those spoken syllables are common words that would be recognized and understood by "every" Cantonese speaker -- even though some of the compounds, like bang1 zung1 = 崩中 = "crashing center", have technical meanings that only a doctor may understand correctly.  GAI says that the only syllable that few people would understand correctly is daan1 = 丹 = "vermilion". 

So presumably the Author would would have understood many of those spoken syllables too.   Heve if he did not know the meaning of daan1, he probably would have understood that the recipe was about some type of "male chicken", that it was "sweet" and "slightly warm", that the main indications came next, that the first few are specific to women, that the first of those was something something dripping down, etc.

The syntax and word order would not be those of colloquial Cantonese, but that is mostly due to the text being (at the time) 1700 years old and intentionally very terse -- not because it was written by someone who spoke another dialect.  

That literal one-syllable-for-one-hanzi would have been almost as good as the Author could possibly get.  If the Dictator had tried to give a translation into colloquial Cantonese, I don't think he would have understood much more.

So, please explain again, what are your objections to this scenario?

Quote:If I've traced the etymologies correctly, the Italian cognate of "il" is "il", while "pas" corresponds to "passo". Even though it is not the preferred Italian, "Il non mi piace passo de tutto," is the way to do this kind of reading.

No, sorry, it makes no sense to consider etymologies in a translation, in any situation.  In a free translation one expresses the meaning of the source phrase in the target language, using whatever words and syntax a native speaker would use. In a literal word-by-word translation one uses the word of the target language whose current meaning best matches the current meaning of the source word.  

Thus the Italian word whose meaning is closest to that of the French word "pas" in that phrase ("nothing") is "niente".  Not "passo", not at all.

Quote:There's an important consideration in the previous example. I realize you have edited the SBJ to remove 甘 because it did not yield a crib, but otherwise the formulaic way that 甘 appears and its technical meaning are the kind of thing I can imagine him learning, and the Dictator teaching.

Not simply "because it did not yield a crib".  Rather the length of the text before the first daiin in the SPS matches the length of the SBJ text before the first 主 at the usual 5:1 ratio only if the "flavor and warmth" field is omitted from the latter; and this has been the rule for all recipes that I have matched, not just for the "rooster".   

This omission makes sense because that field is meaningful only within Traditional Chinese medical theory.  The "flavor" is not actual flavor, but a "theoretical" flavor that is supposed to have implications for the remedy's effect.  And in this recipe there is only one "flavor" field for all sub-recipes, that include chicken poo, chicken feathers, chicken eggs, and grubs that grow in chicken coops.  Obviously that "sweet" is not really about taste... 

So, either the Author asked the Dictator to skip that field (presumably after the Dictator explained its meaning, or lack thereof); or the Author later realized that those fields were useless in Europe, and struck them from his draft before giving it to the Scribe.  Either way, he apparently did the same, systematically, with "also called" fields, "grows in" fields, and superstitious/religious comments like that one about "heads of chickens that were hung over the East Gate".

Quote:Because most of the words in the descendent languages were cognates with the Classical language.

I still think you are confused here.  Classical Chinese is not a spoken language, it is a written one.  The Chinese "dialects" do not descend from it.  You should have said "Proto-Sino-Tibetan" or "Old Chinese" or something like that.

Quote:I do not think the identification is sound and the statistical argument you've made is based on cribs you identified because there were qualitative similarities. It is no surprise to me that there were able to quantify that, but I don't think its a valid or sound identification.

The initial guess that the SPS could be the SBJ, the identification of cribs, and my claim that this is now a proved fact, were all based on quantitative coincidences.  Nothing qualitative.   if you don't accept that evidence -- well, I cannot force you to...

Quote:It is quite unfortunate that my source terms these "dialectical synonyms". This is commonly seen, even in academic contexts, and I have curbed my impulse to be pedantic when other people have indulged it, but as they fail the mutual intelligibility test, they are properly called "languages".

The Chinese "dialects" are indeed mutually unintelligible, but that is mostly because corresponding words (syllables with the same meaning and hanzi) SOUND very different in each language, in a non-systematic way.  Not because of differences in grammar!

I don't speak Chinese, but I have been reading about the linguistic situation in China and other countries for more than 50 years.  You are the first source I have seen to claim that, if each hanzi in a Beijing newspaper article is read in Cantonese, the result would be, not only grossly ungrammatical, but incomprehensible to a Cantonese speaker.  So that a Cantonese speaker who can't speak Mandarin would be unable to read that article.

The insistence of the Mainland Chinese government on calling the various languages of China "dialects"  is politically motivated, sure.  They don't want to see separatist movements (like those that keep arising in Italy, Spain, France, Belgium...).  But that gov has reason in part: because those languages, while mutually unintelligible when spoken, are substantially the same in writing.  Thus they are not "different languages" in the same sense that English and German are different languages (= unintelligible both in speech and in writing). 

This situation is unique in the world, and is made possible by the fact that the writing system is effectively not phonetic at all.  And, in turn, this feature of the writing system kept the grammar of those languages pretty similar over the centuries -- even though the spoken languages changed and diverged like crazy.  

All the best, --stolfi
(15-06-2026, 08:40 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.If I've traced the etymologies correctly, the Italian cognate of "il" is "il", while "pas" corresponds to "passo". Even though it is not the preferred Italian, "Il non mi piace passo de tutto," is the way to do this kind of reading.

Haha. "Ne... pas" is the negative and "du tout" (not "de") means "at all". The etymology of "pas" is too crazy to translate word for word.

Quote:Its use as an auxiliary negative adverb comes from an accusative use (Latin nec… passum) in negative constructions – literally “not… a step”, i.e. “not at all” – originally used with certain verbs of motion. In older French other nouns could also be used in this way, such as ne… goutte (“not… a drop”) and ne… mie (“not… a crumb”), but in the modern language pas has become grammaticalized.
You are not allowed to view links. Register or Login to view.
It's even more bizarre knowing that "ne" is often dropped in speech. I understand step the manuscript Voynich.

This made me wonder about the etymology of "ne...rien", nothing. This "rien" part apparently comes from Latin rem (nom. res), "thing". So ne...rien is basically the same as no-thing.
(16-06-2026, 10:40 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.It's even more bizarre knowing that "ne" is often dropped in speech. I understand step the manuscript Voynich.

This made me wonder about the etymology of "ne...rien", nothing. This "rien" part apparently comes from Latin rem (nom. res), "thing". So ne...rien is basically the same as no-thing.

The story I was once told (now confirmed by Gemini) is that the Romans had an expression for "nothing", "nullam rem natam" = "no thing [ever] born".  When the Roman Empire collapsed, their nothing was looted by the barbarians and split among the breakaway kingdoms: the Italians kept the "nulla", the French got the "rien", and the Spanish ran away with the "nada".

All the best, --stolfi
(16-06-2026, 10:05 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.The etymology of "pas" is too crazy to translate word for word.

But that was my point: in those special situations where one wants a word-for-word translation (which of course will usually be ungrammatical and convey the wrong meaning), it makes no sense to consider the etymology of each word.  One should consider its current meaning in the given context and look for the word of the target language whose current meaning best matches it.

"Pas" in that French sentence is an (obligatory) reinforcement of the negation, vaguely conveying the idea that there is no qualification or mitigating factor that would soften it. In other contexts it would definitely map to "nothing" or "none" in a free translation.  Italian happens to have the word "niente" that would convey about the same meaning in that context.

All the best, --stolfi
(16-06-2026, 11:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view. Italian happens to have the word "niente" that would convey about the same meaning in that context.

All the best, --stolfi

"Niente affatto" too, "ne.. pas", "nothing/not at all" (but different etimology)
Seems I could have been clearer that the point of this example was that "Il non mi piace passo de tutto" is French with Italian readings rather than Italian translated from French. Otherwise I too believe it is "crazy", "bizarre", and "makes no sense", and I'm glad it looks that way to other people, at least when they see it transposed into a Western European context. As a rule, vernacular Chinese character readings were selected based on cognation without respect for semantic or syntactic changes, and so readings do not respect such changes.

I actually softened the example a bit because in this analogy French should have Latin spelling. To my best guess, this is what French with Latin orthography would look like, followed by French and Italian readings in French based on the real-world orthographies:
Quote:
  • French, on the page: Illi non me placet passum de totus.  
  • French, from the mouth of a French speaker: Il ne me plait pas de tout
  • French, from the mouth of an Italian speaker: Il non mi piace passo de tutto
(Notes: "illi" is strictly Vulgar, but I am not researching a more faithful Classical orthography for this illustrations; I may not have all the declensions right, but I'm reasonably sure "totus" is correct as "tout" is not descended from the ablative.)

I will say more about Jorge's other points in due time, but I wanted to make sure it was clear what I was trying to illustrate since it garnered responses from other people.
(16-06-2026, 07:17 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.Seems I could have been clearer that the point of this example was that "Il non mi piace passo de tutto" is French with Italian readings rather than Italian translated from French.

I offered that example only to illustrate the difference between a word-for-word translation and a free translation, nothing more.  It was not meant to be parallel to the Chinese situation.

Your reworking of my example is inappropriate for that simple goal, because the etymology of the words is absolutely irrelevant for any purpose except for linguistics research.  Thus the word-for-word translation of "il" (French masculine pronoun) is not "il" (Italian masculine article), but "quello" or "lui", depending on the referent.  And the word-for-word translation of "pas" is not "passo", because the French speaker does not think of a step when he uses that word; but is "niente" or "nulla" or some other similar Italian negation-reinforcing word.

Quote:I actually softened the example a bit because in this analogy French should have Latin spelling. To my best guess, this is what French with Latin orthography would look like, followed by French and Italian readings in French based on the real-world orthographies:
[quote]
  • French, on the page: Illi non me placet passum de totus.  
  • French, from the mouth of a French speaker: Il ne me plait pas de tout
  • French, from the mouth of an Italian speaker: Il non mi piace passo de tutto

This version makes no sense with respect to reading or translating French to Italian.  The first line is not "French with classical orthography"; it is just (very bad and rather nonsensical) Latin -- another language altogether.  You could have used German or Russian instead; the example would not make less sense.  So the second line is a translation of that Latin text to French (and an incorrect one, because the Latin "passum" means step, while the French "pas" does not).  And the third line would absolutely not be "French, from the mouth of an Italian speaker", but an independent translation of that Latin text to Italian.

And that example is totally inadequate as an analog of the situation in the proposed COT scenario.  Again, Classical Chinese is not a spoken language, it is a written one.  What would be on the page would be 

  Very Old Chinese writing: 丹雄鸡味甘微温主女人...
  Read word-by-word by a Mandarin speaker: dān xióng jī wèi gān wēi wēn zhǔ nǚ rén...
  Read word-by-word by a Cantonese speaker: daan1 hung4 gai1 mei6 gam1 mei4 wan1 zyu2 neoi5 jan4...

Those are of course simplified characters and modern readings.  In the proposed scenario, the text would be in traditional characters, and the readings would be in Mandarin or Cantonese as they were in the 1400s. But it makes no difference, agreed? 

The Dictator would have used whatever was the local language that the Author presumably had learned.  Most of the syllables that he uttered, even though they were literal readings of the hanzi, would be common syllables of the local language, and the Author presumably understood most of them.   Whatever language the Dictator was using (including Mandarin), the syntax of his spoken sentences would be that of the Very Old Classical Chinese which the SBJ was written in -- not colloquial Mandarin or Cantonese, not even the  literary Mandarin or literary Cantonese of the 1400s.  But, since the sentences of the SBJ are extremely short, that archaic syntax would be the least of the Author's problems. 

It is possible that the Dictator did not do a hanzi-by-hanzi reading, but attempted to tweak the word order and hanzi readings on-the-fly so as to make the resulting spoken text more compatible with the grammar and lexicon of the local language, for the benefit of the Author.  This would make no difference for the proposed scenario, or for the SPS≈SBJ claim. Except that the extent of this adaptation must have been rather limited, because any extensive changes would change the spacing of the cribs and ruin the numerical matchings. And the benefit for the author would have been minimal anyway.

Thus a more relevant analogy using French and Italian could be

  Written: 5392
  By a French speaker, character by character: /sɛ̃k tʁwɑ nœf dø/ ["cinq trois neuf deux"].
  By an Italian speaker, character by character:  /ˈtʃiŋkwe tre ˈnɔve ˈdue/ ["cinque tre nove due"].
  By a French speaker, tweaking for grammar: /sɛ̃k mil tʁwɑ sɑ̃ katʁəvɛ̃duːz/ "cinq mille trois cent quatre-vingt-douze"].
  By an Italian speaker, tweaking for grammar: /tʃiŋkweˈmila tretʃɛntoˌnɔvantaˈdue/ ["cinquemila trecentonovantadue"]

All the best, --stolfi