The 'Chinese' Theory: For and Against

The 'Chinese' Theory: For and Against - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: The 'Chinese' Theory: For and Against (/thread-4746.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

RE: The 'Chinese' Theory: For and Against - MHTamdgidi_(Behrooz) - 14-02-2026

(14-02-2026, 05:40 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(13-02-2026, 03:48 PM)MHTamdgidi_(Behrooz) Wrote: You are not allowed to view links. Register or Login to view.with all the twists and turns you are giving to make the back story plausible

Because the explanations that people have been giving for how an European language could be so well encrypted, how come the plants and cosmology are not recognizable, why the Zodiac diagrams have 30 labels each instead of 28/30/31 and why some are split into 15+15, and what are those nymphs doing in those tubs and showers between organs -- those are not "twists and turns", right?

"A small community of people who invented a secret language and script to communicate among themselves"

"A swindler who used an invented script and complicated and laborious method to produce random text, that to Europeans at the time would have looked utterly unlike language or code, with not a single reference to alchemy, in order to sell it to an Emperor who was obsessed with gold-making alchemy."

"A scholar who was afraid that the Inquisition, which he was sure would be created by the Church any time soon, would burn him at stake for his heretic thoughts, and therefore cleverly disguised them in a book with filled with bizarre attention-grabbing illustrations, in a baffling script that looks totally like an attempt by someone trying to hide heretic thoughts from the soon-to-come Inquisition".

And hundreds more...

All the best, --stolfi

I am puzzled by how you avoid answering questions directly when not suiting your needs and how logical double-standards serve your theory claims.

Zodiac diagrams are supposed to have 30 degrees, so I agree with ReneZ and many others that those marginalia in French for month names are added by others. In my view, the French month names are obvious errors, because each Zodiac month would correspond with two consecutive month days (which would explain the 28/30/31 regular calendar months spreading in them).

So, this also explains why two of the Zodiac months are rendered in 15/15 for focus, apparently, inviting reasonable explanations. The Voynich manuscript is pretty consistent and accurate in rendering Zodiac month degrees that have survived in the existing manuscript.

So, here we have evidence that you use marginalia errors (not the manuscript’s own material) to dismiss reasonable Zodiac 30 degrees (and those two month 15/15 splits for focus) to explain them away as mere decorations.

This is what the problem is globally with your statistical reductive reasoning. You dismiss anything in the VM that challenges your theory.

So, as far as narratives you listed go, the same thing is happening, you pick and choose some narratives (and “hundreds more”) as a way of not answering the question, and not even acknowledging those offered by those you are addressing.

You mention elsewhere, “Still, the violence of the reaction this time tells me that the evidence is good.”

Sorry, but this follows the same logic and is an unfair characterization.

The reason you are getting lots of reactions as such is because you invited it in the first place by your anagram publicity announcement. You built high expectations, so people are paying closer attention and more widely. It is not fair to characterize it as “violent”. In fact, it started off as very friendly, and still is, despite serious criticisms made.

One thing you are not appreciating, Jorge, is that attention taken for your claims is attention taken from other pursuits. You should be grateful for people answering your anagram initiated call, and expecting that they agree with everything is not reasonable. I for one hesitated sharing things I planned, because I was trying to allow for your announcement to materialize, which came in renewed “a few days” cycles. Actually, I did not even know it was about your Chinese theory, but something about the last page marginalia finding.

In any case, this is my last post on this topic you have shared re. Chinese theory, since it is now going in circles, and I am not any more convinced by how you are answering my questions, sorry. I wish you well in developing your theory and finding ways of convincing others.

RE: The 'Chinese' Theory: For and Against - rikforto - 14-02-2026

(14-02-2026, 02:47 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Maybe in the VMS the "main uses" keyword is often omitted, because it is superfluous. Note that the shortest SBJ recipe omits that keyword altogether. After all, the first 主治 serves only to separate the list of diseases from the "taste and flavor" field; if that field is present in an SBJ recipe but is omitted in the SPS version (as it was in the Rooster case), the translation of the following 主治 could have been omitted too.

Two issues here!

First, this is begging the question. The appearance that "keywords" have been omitted depends on seeing the correspondence in the first place. The need for an explanation here and the fact it is speculative goes to the weakness of the correspondence.

Second, and insofar as we take it to be antecedent to the claim, it is also dubious on its own merits. The source language, literary Chinese, is famously very terse; if the verb, 治, could be omitted it would not appear in the original. It does not only serve to separate the uses from the properties, but carries important grammatical information, namely establishing the new topic as "principle uses". In your example, omitting the "principle indication" pair 主治 shifts the topic to women and children 女子, and the rest of the clause becomes a comment on them, namely that they collapse from discharge 崩中漏下, or something like that. (I will defer to someone more practiced with Classical Chinese if they say I've missed some nuance, but what matters here is that the sentence is no longer about *treating* them, but the fact that the disease happens to women and children.) Broadly, this is going to be an issue in any isolating language in the area. The terseness of the original and the fact that topic-comment works this way in most of the potential target languages as well means that it would be surprising if the topic were omittable.

RE: The 'Chinese' Theory: For and Against - Jorge_Stolfi - 14-02-2026

(14-02-2026, 10:15 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think you did [hunt for books that fitted the statistics of the SPS]. ... Probably you were still dissatisfied with the match and you had to design custom paragraph breaks in SPS to make the data fit better.

So now you are accusing me of lying and cheating? That desperate?

Why would I do that? The claim "SPS≈SBJ" is quite specific: "The SPS is a close almost word-for-word version of the SBJ, and the meaning of daiin is something like 'uses'". Unlike other theories, including the general Chinese Origin theory, that claim is easily falsifiable. If turns out to be false, I will look like a fool. If I did not feel like I had solid evidence, lying and cheating would only make this outcome more likely, and worse.

Quote:you were looking for a text that would fit the profile of SPS - a large number (preferably ~300) of relatively short paragraphs. You didn't consider huge 1000 page long tomes, you didn't consider short treatises either, because these wouldn't match quite obviously. So you ran a pre filter that removed most books with wildly different statistics from consideration.

Again, even today, the only Chinese book for which I have any statistics at all is the SBJ; and the only statistic I knew, before I got hold of the digital version, was the number of paragraphs -- because that was always stated in the articles that mentioned it. I did not know the min, max, or average size of the parags. Much less the parag size histogram, or that 主 was its most common character, or how many times 主 occurred in its longest recipe and how many times daiin occurred in the longest SPS parag, and how many words/characters there were between those occurrences.

So, no, I did not hunt for a Chinese text that matched the statistics of SPS. Again, I first suspected that the SPS may be a version of the SBJ because of the parag count and because the SBJ was the most likely materia medica that the Author would have chosen to translate. AFAIK, it had the same place in Chinese medicine as Dioscoride's herbal had in European herbalism.

And even if I had hunted among Chinese books for a more detailed statistical match, that would have been a perfectly valid thing to do. Like hunting for European manuscripts with images that match the VMS images, or with letter styles that match the You are not allowed to view links. Register or Login to view. scribbles, or with month names that match those of the Zodiac. If one suspects that some section of the VMS may be an encoded version of some German book, the obvious first thing one must do is look for a German book with a similar word and/or parag count.

Quote:you had to design custom paragraph breaks in SPS to make the data fit better. Otherwise I think you would just be happy with the obvious paragraph breaks.

Indeed, I decided to revise the parag breaks because I thought that wrong breaks may be the reason why the histogram of the SPS was broader than that of the SBJ. But while I was revising those breaks, I did not try to "customize" them to fit the SBJ histogram -- which would have been impossible to do "on the fly" anyway. Again, the justification for every single break in my revised transcription is recorded You are not allowed to view links. Register or Login to view..

And in the end, as I said before, that did not help: the histogram of the SPS was still broader than that of the SBJ. Because my revision changed very few breaks. Because most of the breaks were obvious (after a short line and near a star), and for the others I mostly agreed with the guesses of previous transcribers.

And, by the way, all that was before I found that 主 ≈ daiin. Thus my revised SPS still has several of those guessed breaks before lines that start with daiin. Now I see that those guesses were almost certainly wrong.

Excluding the blocks of lines with dubious parag breaks improved the histogram a little, but the improvement was not enough to justify publishing this "cleaned SPS". I had to conclude that the cause for the broadening was something else.

I thought that maybe the cause was inconsistent transcription of word spaces. So I tried deleting all commas, treating all commas as word spaces, and deleting all word spaces and counting EVA characters instead. But the histogram was still too broad.

But now, after analyzing the Rooster pair, I think I have a plausible explanation, at least for part of that extra variation. While the SPS is generally an almost word-for-word match to the SBJ, I now see that the VMS Author omitted certain fields of certain entries -- like the "taste and warmth" at the beginning of the Rooster recipe, and the veterinarian uses and the "grows in..." field at the end. (In fact, it is quite possible that those parts of that entry were added to the SBJ by the authors who reconstructed it after 1500. And, luckily for me, the shortened version was still the longest parag in the SPS.) But such omissions must have affected different entries by different amounts; and that could explain some of the extra spread in the SPS.

All the best, --stolfi

RE: The 'Chinese' Theory: For and Against - tavie - 14-02-2026

OK, this is starting to get a little heated.

Jorge, I don't believe oshfdk is arguing you are acting in bad faith by lying or cheating. I assumed he was suggesting that pre-filtering is the kind of thing you can easily miss through confirmation bias, like solvers who subconsciously look through a dictionary for words associated with plants when doing their translations and are genuinely surprised and delighted to find one.

The violence of the reaction you mention is not unique to this thread. There has been intense criticism in the Irish thread, the Turkish thread, and in the many Latin threads. When you express certainty about something, as opposed to purely suggesting an idea, you're going to face stronger criticism. But if you're uncomfortable with some posts or the tone, please let Koen and me know.

As a general reminder to everyone: please keep things civil, assume good faith in the person you are disagreeing with, and try to steer clear from anything that could be seen as an accusation of bad faith or bad motivations.

RE: The 'Chinese' Theory: For and Against - Jorge_Stolfi - 14-02-2026

(14-02-2026, 10:05 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.1) I suspect that [entropy of 10 bits per word] is a consequence of (and mathematically equivalent to) the ranked word frequency distribution being Zipfian given that entropy is a measure of how not-flat the distribution is

Indeed, given the information that the the word frequency distribution is Zipfian, and the size of the lexicon, one can compute the value of the zero-order word entropy. The first property alone is not sufficient.

Quote:2) as I observed earlier, your argument regarding 'daiin' requires assuming that roughly (at best) only 42% of instances of 'daiin' are written as 'daiin' (as opposed to as a prefix or suffix of another "word"), which does not qualify as an "encoding where each word is (almost) always spelled/encoded in the same way."

Word spaces are rarely produced in the spoken language, unless one deliberately makes a pause before each word. (An old famous article on speech recognition is titled "How to wreck a nice beach"). Thus it will not be surprising if turns out that the Author often missed word breaks when taking dictation. And then the Scribe, not knowing the language, may have got more spaces wrong when reading the Author's draft. (Did you see the "busillis" anecdote I posted a while ago?)

And the Author probably made many spelling mistakes when taking dictation. And the Scribe added some more when trying to guess whether a glyph on the draft was r or s, ch or ee ...

If the SPS≈SBJ claim is true, all those puzzles will soon be solved.

All the best, stolfi.

RE: The 'Chinese' Theory: For and Against - oshfdk - 14-02-2026

(14-02-2026, 05:20 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(14-02-2026, 10:15 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think you did [hunt for books that fitted the statistics of the SPS]. ... Probably you were still dissatisfied with the match and you had to design custom paragraph breaks in SPS to make the data fit better.

So now you are accusing me of lying and cheating? That desperate?

I never mentioned lying or cheating. I don't know your exact process, but it appears that revising paragraph breaks was for some reason important for you. If not to be able to match paragraphs with another source, then why would one do this?

I'm not sure why you think I'm desperate. I just thought you wanted some feedback. I have no particular interest in your theories other than MRT, which has some implications for decoding attempts, but given that MRT no longer appears plausible to me, I don't have much interest for it either.

RE: The 'Chinese' Theory: For and Against - Jorge_Stolfi - 14-02-2026

(14-02-2026, 06:34 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I never mentioned lying or cheating.

Apologies if I misunderstood you. I thought you said that I did search for a book that fit the statistics of the SPS, after I said that I did not.

And you said that I tweaked the parag breaks to support the SPS≈SBJ claim; and that, I agree, would have been cheating. But, again, no -- when I was revising the parag breaks, I had no idea of whether I was improving the histogram fit, or making it worse. Again, in the end the change was not worth mentioning.

Quote:I have no particular interest in your theories other than MRT, which has some implications for decoding attempts, but given that MRT no longer appears plausible to me, I don't have much interest for it either.

Too bad, because I keep finding things that solve many outstanding puzzles. But that is another thread...

All the best, --stolfi

RE: The 'Chinese' Theory: For and Against - oshfdk - 14-02-2026

(14-02-2026, 09:21 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Apologies if I misunderstood you. I thought you said that I did search for a book that fit the statistics of the SPS, after I said that I did not.

I think that you searched for a book that might be a source for SPS, a book that is the right size and structured in about the same way. Which effectively means that you were looking for a large collection of short paragraphs of about the right size (not page long, not just a few words long), which already puts a constraint on the statistics. The paper literally says "we noticed the SPS seemed very similar in size and structure to the SBJ", which is a qualitative version of a statistical match.

(14-02-2026, 09:21 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And you said that I tweaked the parag breaks to support the SPS≈SBJ claim; and that, I agree, would have been cheating. But, again, no -- when I was revising the parag breaks, I had no idea of whether I was improving the histogram fit, or making it worse. Again, in the end the change was not worth mentioning.

I said "Probably" because I assumed that the original statistics were not a good match. I'm not sure how this is cheating, if you have reasons to believe that the paragraph breaks needed to be revised and there is a consistent method for revising them. I assumed you would start with computing SPS statistics using paragraphs from the existing transliterations, but if you never did this before changing paragraph breaks, this couldn't affect the fit, so my "probably" was wrong.

I'm sorry if any of my replies sounded rude or insulting, I didn't mean them this way. I think I will excuse myself from this thread, because the discussion really gets heated and tedious at the same time.

RE: The 'Chinese' Theory: For and Against - kckluge - 16-02-2026

(14-02-2026, 05:52 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(14-02-2026, 10:05 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.2) as I observed earlier, your argument regarding 'daiin' requires assuming that roughly (at best) only 42% of instances of 'daiin' are written as 'daiin' (as opposed to as a prefix or suffix of another "word"), which does not qualify as an "encoding where each word is (almost) always spelled/encoded in the same way."

Word spaces are rarely produced in the spoken language, unless one deliberately makes a pause before each word. (An old famous article on speech recognition is titled "How to wreck a nice beach"). Thus it will not be surprising if turns out that the Author often missed word breaks when taking dictation. And then the Scribe, not knowing the language, may have got more spaces wrong when reading the Author's draft. (Did you see the "busillis" anecdote I posted a while ago?)

And the Author probably made many spelling mistakes when taking dictation. And the Scribe added some more when trying to guess whether a glyph on the draft was r or s, ch or ee ...

If the SPS≈SBJ claim is true, all those puzzles will soon be solved.

All the best, stolfi.

The problems isn't that what you say above is very hand-wavey -- although it is. The problem is that your draft paper appears to be asking me to accept two fundamentally quantitatively inconsistent claims regarding spaces in the SPS:

* In Section 3.4, treating uncertain spaces as spaces, you conclude that, "The average size of an SBJ parag is 36.97 Chinese characters, while the average size of an SPS parag is 33.95. These numbers are surprisingly close, and imply that, if the SPS is a version of the SBJ, then each Voynichese word in the former corresponds roughly to one Chinese character in the latter. More precisely, to 1.089 words." That implies a bound on how undersegmented the Voynichese can be. If "...the Author often missed word breaks when taking dictation....", then according to this claim he/she didn't miss more than ~8% of them.

* As I pointed out in my earlier post, treating uncertain spaces as spaces results in only 129 of the occurences of 'daiin' being as a word, with all but 8 of the remaining 177 occurences being as a word suffix or prefix. Which implies a far higher rate of undersegmentation if we assume 'daiin' is typical. And requires an explanation if we are asked to believe it is not.

With regard to error rates involving spaces on the part of an ignorant scribal copyist, while I need to read Alan Farne's thesis (You are not allowed to view links. Register or Login to view.) in detail (particularly Chapter 4 -- "I discuss in Chapter Four the need to better understand the scribal habits of manuscripts written by scribes who wrote in their non-native language."), it's worth quoting from his summary of his work at You are not allowed to view links. Register or Login to view.:

"....I have transcribed, collated, and analyzed all three of these codices by test passages in order to determine the scribal habits of the scribes of 0319 and 0320. All three of these manuscripts are Greek-Latin diglots. There were two surprising conclusions from the analysis of these manuscripts: first, the scribes of both direct copies neither added nor omitted any words. They broke even completely on word count. Not only did they break even but they had no variants of adding or omitting any words. They copied their exemplar almost exactly. They did make many substitutions, spelling errors, and nonsense errors, but they did not add or omit a single word. Which leads to the second surprising conclusion: these scribes were this accurate because of their ignorance of Greek. These manuscripts are diglots with Greek and Latin and an analysis of the scribal habits shows that these scribes were more proficient in Latin but had very little knowledge of the Greek language. They therefore copied the text extremely accurately but when they made a mistake it was an egregious mistake which usually resulted in such a nonsensical error that the result was not even a real Greek word. I have therefore tentatively concluded that scribes who do not know the language they are copying may copy extremely well for the most part but when they reach a difficulty they may produce an extremely obvious error."

So it sounds like (lanugage-)ignorant scribes do very well with regard to their accuracy copying spaces. (Farne's work came to my attention because at one point I was searching for data on scribal error rates in order to try to quantify at what point appealing to scribal error as an explanation for exceptions to a proposed rule of Voynichese behavior becomes unreasonable).

RE: The 'Chinese' Theory: For and Against - kckluge - 16-02-2026

(14-02-2026, 02:47 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(12-02-2026, 04:48 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.主 appears at the very beginning of each recipe, after the name and the category, with only 2-3 exceptions. If 主 corresponds to daiin, where is the same pattern in the Voynich MS? Is there a guaranteed daiin near the beginning of each paragraph? This is visually the most obvious pattern in SBJ.

Good question. Actually, the honest question would be, "Considering that 83% of SBJ recipes have a 主治 = "main use" within the first 12 characters or so, which word or pattern occurs in a similar percentage of the SPS parags within the first 60 EVA characters or so". Since it seems that on average each Chinese character in the SBJ corresponds to about 5 EVA characters in the SPS.

If we exclude the problematic blocks of lines of the SPS where the parag breaks are not obvious, we are left with 243 "probably true" parags.

Of these, 51 (21% of the 242) have a daiin (as a single word or part thereof) within the first 60 EVA characters.

Of the remainder, 20 (8% of the 242) have a dair or a laiin (or both) in that range. (Those are the two other words that match the positions of 主 in the Rooster recipe).

OK, so ignoring spaces altogether now along with all the problems associated with assuming 1 EVA character = 1 glyph in Voynichese, looking at your transcription I'm getting (306 / 55881) = 0.5476% as the probability of a random 5-gram being 'daiin'. The probability of the 1st 60 glyphs containing at least one 'daiin' as a substring is therefore

1.0 - ((1.0 - 0.005476)^56) = 1.0 - (0.9945 ^ 56) = 1.0 - 0.7353 = 0.2647 (0.26471295... doing the calculation in my phone's calculator app without any rounding beyond the app's precision)

So the probability of 'daiin' occuring randomly within the first 60 EVA characters is ~26.5%, and it occurs there in the SPS 21% of the time.

If you want to take the time and run the numbers for all of your candidate misspellings to make the case that the numbers are above random chance, go ahead. I feel I've done my due diligence here.