The Voynich Ninja

Full Version: The 'Chinese' Theory: For and Against
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
(28-02-2026, 08:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Something seems to have gone wrong with the Thai version. What I see is a phonetic rendering of the original Chinese.

I asked GAI and ChatGPT to correct each other's output, and they eventually converged to the following version.  I have removed the "-" to make the eventual comparison to the VMS more fair.  Unmarked syllables are tone 1, all the other tones should be explicitly marked 2-5.

Code:
(A) Kai2 thuek4 daeng: [gae2] ra du khao5 daeng, bam rung sang khan5, un2 that4 fai, ham3 lueat4, cham ra chit2, dap2 phit4, kan sing ap2 mong khon
(B) Hua5:              [gae2] prap2 phi5 sat2 rai4
(C) Man:               [gae2] hu5 tuek2
(D) Sai5:              [gae2] pa sa5 wa4 mai3 yut2
(E) Phang phuet2 nai:  [gae2] thong4 sia5 thong4 ruan
(F) Khi5 kai2 khao5:   [gae2] rok2 hiu5 nam4, khai2 wat4 nao5 ron4
(G) Khon5 pik2:        [gae2] gae2 lueat4 duean mai3 ok2
(H) Khai2 kai2:        []    dap2 phit4 ron4, phu phong fai, kan chak2; [pra2 sit2] pen khong5 sak2 sit2 dang am phan

Does it make more sense?

All the best, --stolfi
Literary Chinese is not the same language as Mandarin or Cantonese. Google Translate copes better with Literary texts than if you were trying to put Latin into the Portuguese translator because of the logographic nature of the text and the fact that Literary turns of phrase are much more current in China than Latin ones are in countries that speak Romance languages, but the language of the SBJ is emphatically not the language that Google Translate is trained to work with. You cannot rely on it for work with text in Literary Chinese.
(28-02-2026, 05:19 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.Literary Chinese is not the same language as Mandarin or Cantonese. Google Translate copes better with Literary texts than if you were trying to put Latin into the Portuguese translator because of the logographic nature of the text and the fact that Literary turns of phrase are much more current in China than Latin ones are in countries that speak Romance languages, but the language of the SBJ is emphatically not the language that Google Translate is trained to work with. You cannot rely on it for work with text in Literary Chinese.

I know that Google Translate is useless for translating the SBJ. (It translated "Minerals, Low grade" to "inferior department of Jade", and worse things).

I am using Google AI, and now also ChatGPT, to translate the SBJ to English and other languages. Those lalamos claim to base their translations on extensive scholarly literature about old Chinese medical texts, in particular the SBJ and its translations in other languages.  Thus they offer detailed explanations for their translations, including references.  But I have not checked the references myself, and they may well be hallucinating.

JoJo's suggestion of using each system to check the proposals of the other has been quite useful. It corrected several mistakes and bad choices that they had made.  In the end I got a text that was approved by both systems.  Of course that still may be very wrong (like the "consensus" of human experts) but it is the best I can do at the moment.

(The other day someone posted a clip of a VMS drawing where one nymph is pulling the towel or hair of another.  It made me recall a famous Japanese block print (ukyo-e) depicting a brawl in a women's public bath. But I didn't remember the name of the artist, so I asked Google AI.  It promptly gave me two answers, with name, date, everything.  But one of them was a peaceful interior scene from a women's bath house without any brawl.  I struggled to find the other one.  In the end I found that there were three famous paintings of brawls in bath houses, both men's and women's.  Goggle AI had mashed together the names f the three artists into a single non-existent new name.  

Anyway, one of the three was the one I remembered.  Of course it is not relevant to the VMS since it is fairly modern, but You are not allowed to view links. Register or Login to view. for curiosity.)

All the best, while it is still possible, --stolfi
(28-02-2026, 05:18 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Does it make more sense?

Yes, this is largely recognisable. The previous version was not Thai at all.
The number are confusing and make it hard to guess. Thai tones are numbered 1-4 (with no number for 'neutral'), where the assignments are swapped w.r.t. Mandarin (1,3 -> 3,1 and 2,4 -> 4,2).
hua5 (=head) would be rising = 4th tone.

Thai has many unseparable polysyllabic word. Some seem to appear here but have simply been split, e.g.
bam rung = bamrung, mong khon = mongkhon . Both words fit very well in this type of text.
(01-03-2026, 12:08 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Yes, this is largely recognisable.

Thanks! 

However it seems that line (F) of that revised version not quite correct after all.  In the Chinese version, the material for that sub-entry is 屎白 which is literally "excrement white" but must mean "white part of the bird's droppings".  It seems that the Thai translation  "khi3 kai2 khao5" is a bad literal translation of 屎白 that means "white droppings", which is not the actual meaning.  Prompted, Google AI and ChatGPT admitted the probem and said that to get the proper meaning it should have been "khi3 kai2 suan khao5".  

Does that make sense?

Quote:The numbers are confusing and make it hard to guess. Thai tones are numbered 1-4 (with no number for 'neutral'), where the assignments are swapped w.r.t. Mandarin (1,3 -> 3,1 and 2,4 -> 4,2).
hua5 (=head) would be rising = 4th tone.

I can ask the lalamos to change the tone marking system.  The numbers 1-5 were their initial umprompted choice. I asked for a default tone to be unmarked, to lighten the text, and they chose their tone 1.  Is there a name for your marking system?

Quote:Thai has many unseparable polysyllabic word. Some seem to appear here but have simply been split, e.g. bam rung = bamrung, mong khon = mongkhon . Both words fit very well in this type of text.

I see.  I split the syllables for the same reason that I omitted all punctuation: to hopefully make it easier to compare this "Thai" text to the VMS text, and (in case Voynichese is indeed Thai) maybe identify additional cribs.  (Of course there would have been many changes to the language, etc.)  

But it may well be that the joined words in Voynichese were actually polysyllabic words of the original language.

All the best, --stolfi
(01-03-2026, 12:29 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.It seems that the Thai translation  "khi3 kai2 khao5" is...

Yes, I was wondering about the white chicken poop, (also because the version I saw had khi5, not khi3).
Should the very last word be amber? amphan is another unseparable bisyllabic word.

W.r.t. your other question, centuries old books exist - I have seen some, but I doubt they go back to the early 1400's.
I am not sure about that though.
(01-03-2026, 12:48 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view. also because the version I saw had khi5, not khi3.

The last round of interactions, pitting ChatGPT agaist Google AI, concluded with fixing a few tones. I don't remember which, but that may have been one of them.

Quote:Should the very last word be amber? amphan is another unseparable bisyllabic word.

Yes.  Translations of that part of the Chinese recipe vary a bit, but they generally agree that it is an alchemical use of the eggs, rather than a medical use; specifically, that the eggs can be turned into an amber-like substance.  

All the best, --stolfi
For whatever it is worth, I have put on my univ's server the digital files of the Shennong Bencaojing that I am using.  

In You are not allowed to view links. Register or Login to view. :

bencao-4.uts  Chinese characters (simplified), Unicode, UTF-8 encoding.
bencao-4.pys  Mandarin reading of the above, in pinyin with diacritics for tones.

As explained in the comments at the top of the first file, it is the merge of two digital files obtained from the internet.  

They were in different orders, and both contained some errors, like entries smashed together or truncated, and mixed traditional/simplified characters.  I fixed the errors I could detect, but more may remain. (The book is supposed to have 365 entries, but the input files had less than 360, and my merge and fixes got the number up to 363.) 

I also mapped all to simplified characters because they are generally easier on my poor eyes. 

I have also added punctuation, like []around the main "keywords", to highlight the structure of the entries.

Please ignore the other files in that directory, they are garbage or out of date.

All the best, --stolfi
A quick update on the 'SPS = SBJ' claim -- that the Starred Parags section of the VMS is a transcription of the Shennong Bencaojing, a very old (>2000 yrs) Chinese materia medica.

For the past month I have been trying to get further matches between SBJ entries and SPS parags.  

As I noted before, I was very lucky with the "red rooster" and "flying squirrel" entries because they were extreme outliers of the parag size distribution, and the former had no less than seven occurrences of the critical keyword 主 ("mainly for", "main uses", etc.). 

But all other SBJ entries are in the fat part of the histogram, therefore length alone is not enough information to find the matching pairs.  Moreover the rooster entry showed that the Author of the VMS omitted certain fields that made no sense outside of Chinese medical theory (like the nominal "flavor" and "warmth" of the remedy) or outside China (like the habitat of the plants); or uses that were not medicinal (like "the grubs that grow in chicken poop are good  for fattening pigs")  

So I must look instead for multiple occurrences of 主, and find parags of the SPS that have occurrences of daiin or its attested variants with the right distances between them.  Fortunately the fields that may be omitted are at the beginning or end of the entry; very very rarely in the middle.

But there are only a few SBJ entries with two or more occurrences of 主; none with more than three.  Most entries have only one, and several have none.  Moreover, as I noted before,  there are only 243 SPS parags that I can use, because the other estimated 122 either were in the missing folios or were smashed together in three "superparags" with multiple stars each.

Still, I managed to match a few more pairs with confidence.  And I think I got two more "cribs" that should make the search quicker and more reliable from now on.

气 qì = chedy

Chinese medical theory is heavily based on the concept of "qì"   (the pinyin "q" is pronounced like "ts" but with the tongue father back,  maybe like the "ch" in german "ich").  IIUC, it is some impalpable fluid, sort of a "vital energy", that is supposed to flow from the "middle" 中 (the "digestive center", stomach and spleen) to other parts of the body.  Many diseases and conditions are explained as the 气 not flowing, flowing too much, flowing in reverse, flowing to the wrong place, etc.  For instance, severe cough is supposed to be the result of the 气 flowing up from the 中 towards the lungs, rather than down as it should. 

Thus naturally that character is used all over the SBJ. I count 303 occurrences in 242 entries entries (out of 363 total entries). 

I now believe (with almost certainty) that the "translation" of 气 in Voynichese is chedy. Modulo perhaps eventual spelling errors, like cheda.  The word "chedy" (without prefixes or suffixes) occurs 210 times in my file of the SPS, in 146 entries.  The word cheda occurs 92 times, and chedo 3 times, or 305 times total.  Extrapolating for the missing pages (by the factor 26.2/22.6) , the original number of chedy should have been  ~250, and chedy+cheda+chedo should have been ~360.  Thus, while my matching results support 气 being mostly translated as chedy or cheda or perhaps other variations, not all of the chedys and chedas are 气 (like not all daiins are 主)

久服 jiǔ fú = qokaiin

Many "remedies" are also "tonics" that can be taken indefinitely as "health supplements" for various benefits, like extending life or keeping the skin smooth. The list of such benefits is often preceded by the keyword 久服 ("prolonged taking"); or, in the case of foodstuffs, 久食 ("prolonged consumption"). 

The keyword 久服 occurs 132 times in my SBJ file, and 久食 occurs 7 times, spanning 138 entries. The hanzi 久 occurs 145 times (since it also occurs in the name of some chronic conditions) while 服 occurs 139 times, and 食 occurs 30 times.

I belive (not quite certain yet) that the keyword 久服 maps to qokaiin in the SPS. The word qokaiin occurs 117 times times in my SPS file.  Scaling by the factor 25.6/22.6, that gives the estimates ~138 for qokaiin in the original SPS.  Again, this "translation" seems to work in the matches that I have identified so far.

I am not clear yet on whether qokaiin means 久服, or just 久, or just 服; and whether it may also mean 久食 or 食.

It is possible that some 久服 or 久食 map to just okaiin without the q.  The word okaiin occurs 85 times in my SPS file.  And there may be some fraction of spelling errors, like those seen in the 'rooster' entry; namely, substitutions of glyphs by other glyphs with similar shapes, such as ykaiin, odaiin, okair

All the best, --stolfi
Addendum: I have some evidence that makes me suspect that the the q glyph is an abbreviation that means "and".  It seems to be prefixed to some or all the words in a list, except the first one.

For example, the SBJ entry with the longest title (12 hanzi) starts with

  <b1.1.014>
  青石赤石黄石白石黑石脂等...

In modern Mandarin, that would be read as

  qīng shí chì shí huáng shí bái shí hēi shí zhī děng ...

Literally, that would be

  "blue stone red stone yellow stone white stone black stone fat together..."

But the full title, expanded and punctuated, is meant to be

  青石脂,赤石脂,黄石脂,白石脂,黑石脂,等

In Chinese (at least in the SBJ), the compound term 石脂 shí zhī = "stone fat" means "clay".  So the title of the entry actually translates to 

  blue clay, red clay, yellow clay, white clay, and black clay, together:

or 

  blue, red, yellow, white, and black clay, together:

Thus, the title as it occurs in the SBJ is halfway through these two translations: the 脂 = "fat" was factored out at the end, but the 石 = "stone" was not. 

That entry is about clay of five colors.  Each color of clay is a distinct remedy with distinct medical uses, but the 等 = 'together" tells the reader that all five are discussed together in this entry.  (A note at the end of that entry explains that each of these five colors of clay acts on one of the Five Organs - Liver, Heart, Spleen, Lungs, Kidneys - that are nominally associated with those five colors.)

The VMS paragraph that best matches this entry, based on the occurrence and spacing of the known keywords ( 主治 , 气, 久服, and 气 again ) is f104v.22, which begins with 

  ytChedy.qokChedy.qotChy.qokChedy.qokChd.lsaiin.qChor.Sheor.ytaiin...
  
So my theory is that the q is a feature of the Voynichese language, without a corresponding Chinese character, that functions sort of like the wa- prefix of Arabic (ve- of Hebrew, we- of Ge'ez).  Here is what Google Translate gives for that title in Arabic:

  altiyn al'azraqa, waltiyn al'ahmaru, waltiyn al'asfara, waltiyn al'abyad, waltiyn al'aswadu, meaan

If true, that feature may give a clue to the native language of the VMS Author...

(By the way, in English we are taught at school that one should never start a sentence with "And".  However, IIUC, that is in fact quite common in narratives of those "Semitic" languages. There, a sentence-initial "And" should be understood as "Then".  Which seems to explain why that "error" is so common in the King James Bible.)

And Big Grin , by the way also, the match with f104v.22 would be even better if one accepted lsaiin as a spelling error for daiin. Note that ls and d are in fact rather close in "ink distance"...

All the best, --stolfi
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47