The 'Chinese' Theory: For and Against

The 'Chinese' Theory: For and Against - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: The 'Chinese' Theory: For and Against (/thread-4746.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

RE: It is not Chinese - tavie - 15-06-2025

(15-06-2025, 09:56 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
The decipherment of Egyptian hieroglyphics could have happened 100 years earlier if Kircher had not declared them to be ideographic.

The last two examples show how decipherment can been delayed by centuries because of "obvious" but wrong assumptions about the language or the nature of the script...

This seems unlikely. Material like the Rosetta Stone and the Bankes obelisk were crucial for the decipherment.

Quote:Some of the "bumpiness" features listed by Nick seem to affect a relatively small fraction of the text, and (like transcription errors, or missing fragments on clay tablets) should not be a big obstacle to decipherment.

I don't see what on Nick's list affects a small fraction of the text, except maybe the right justified titles on a minority of folios. The rest of the issues - repetitions and word types varying across different positions of the text - are chronic problems, so chronic it seems something has been done to the plaintext, if there is one. And we can also add the You are not allowed to view links. Register or Login to view. that I've been looking at to the list.

Whether it is encryption, shorthand, or a combination of the two with changes for cosmetic reasons chucked into the mix, the result is a cacophony of noise. I do not see how natural reasons (e.g. subject matter or linguistic features) can be responsible for all this. It is just too bumpy and noisy throughout.

Quote:It is almost certain now that one-leg gallows are variants of the two-leg gallows, possibly combined with e or other letters, that are used mostly on the first line of a paragraph -- a convention that the Author picked up from the typical European manuscript style. I don't see reason to ascribe any other meaning to single-leg gallows, just as I don't think that split two-leg gallows or fancy decoration on gallows have any linguistic or semantic value.

My feverish ramble in that other thread was not terribly coherent but my point was that no simple explanation works with the ornamental gallows, since there is a fundamental mismatch between the Top Rows and the lines below them in a paragraph. Something far more complex is going on. We start with:

Problem 1: Top Rows* have too many /p/ and too few /k/ in comparison to the lower lines. We don't see this in natural language.

--> Solution: The simple explanation is that p = k. It is just an ornamental flourish used when there is more space to expand.

Problem 2: But in the lower lines, k is frequently followed by e. If p = k, where is pe in the Top Row?

--> Solution: The simple explanation is that p = ke. It operates as both an ornamental flourish and a ligature. A little more complex but nothing unusual.

Problem 3: But in Top Row, p is frequently followed by ch. Over 50% of times, usually! If p = ke, then pch = kech, but where are all the many kech we should see in the lower lines?

--> Solution: ?? Do we start suggesting /ch/ becomes e? Or something else?

We are now looking at having to mutate most if not all of the word type in order to find an equivalence. There is no neat exchange that allows us to match a Top Row word type with a word type in the below paragraph. It's more than just exchanging a few letters. The initials are often different, the middles are often different, and the finals are often different. Why do we see more /sh/ at Top Row? Is this an ornamental form as well? Why are there so many missing initial ch at Top Row and so many word-middle ch? My idea is that initial ch has been shunted into middle position by adding /op/ or /qop/ to the ch words, but to make it work, I also have to mutate the glyphs after /ch/ because they don't match as expected. This is becoming highly complex and does not seem natural.

Quote:The peculiar features at the "margins" of the pages can have banal explanations too. For one thing, on many languages the final letters or words of a sentence may be strongly affected by the topic. In a narrative of past events, sentences are more likely to end with "-ed" in English, with "-ta" in Japanese; whereas in a herbal the sentences should be mostly in the present tense, hence more likely to end with "-desu" or "-masu" in Japanese. Others have pointed out the increased use of abbreviations at end of lines in European manuscripts.

I'm Team Abbreviation too, but if it is going on, it is in a highly complex way. We have the same issue here as with top rows: the mismatch between expected words and actual words at line start and line end is more complex than a change of ending by itself at the line end, or a change of initials at line start. If at line end, it was only daiin becoming dam, or kaiin becoming kam, this would work. But it isn't. Initials and word-middles are often different as well. The same is true for line start.

Line patterns at different positions of the text are serious problems for any idea that we are only seeing a natural language.

Quote:When single-leg gallows occur inside a parag, I would guess that the Scribe failed to see a parag break in the Author's draft and thus started the first line of the second parag as continuation of the last line of the previous one.

I'm not convinced by this but I find it interesting in that it could imply that if we have copying scribes following a layout by an "Author", they have agency in terms of how they lay out the text: they are not obliged to match their line starts with the Author's line starts, nor their line ends with the Author's line ends by cramming in text or widening spaces between words to make the ends match.

So a word that is line start for a scribe may be a mid-line word for the author in their original text, and word that is line end for a scribe may also be a mid-line word for the author. Given how word types seem to undergo complex mutations at these positions, this implies to me that the scribe has an understanding of the system and how to mutate the word: they are not blind copyists.

* by Top Row, I mean the first line of each paragraph but with its first word and last word omitted so as to isolate a top row effect from separate paragraph/line start effects and line end effects.

RE: It is not Chinese - Jorge_Stolfi - 16-06-2025

(14-06-2025, 09:36 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.It's not just about recurring. It's about creating numerous (on the order of thousands in the text of the size of the Voynich Manuscript) repeating patterns, where certain substrings change while surrounding text remains the same. "Take this medicine for fever or bloating, one spoon two times daily", "take this medicine for cough or migraine, one spoon three times daily", etc.

I am looking again at this conjecture (Starred Parags section = some version of the SBJ).

Data files:

You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view. The Starred Paragraphs section (SPS) from Takeshi's transcription in the 1.6e6 interlinear file, from page f102r to line 30 of f116r. With one parag per line, in the EVA encoding, with all alignment fillers and comments removed, all weirdos and missing chars mapped to '*', one "=" at start and end of each line (=parag).
You are not allowed to view links. Register or Login to view.[url=https://www.ic.unicamp.br/~stolfi/EXPORT/voynich/Notes/076/bencao.pin]bencao.pin The SBJ from the webpage posted by @oshfdk, minus the introduciton and section headers, converted to pinyin by Google Translate, mapped to lowercase.

Both files are in UTF-8 encoding. Again, if you just click on those links you will see gibberish, because the server at my Univ expects plain text files to be in ISO-Latin-1 and thus messes up the formatted HTML that it sends to your browser. You will have to download the files and look at them with any text editor or viewer that understands UTF-8.

The paragraph breaks in sstarps.eva are a bit different from those in the 1.6e6 interlinear, because the stars in the margin are not consistent with the hints in the text itself. A parag is assumed to have the following properties

Any line that is less than full width must be the last line of a parag.
Any line that has decorated one-leg gallows must be the first line of a parag.
Any line that has one-leg gallows is somewhat likely to be the first line of a parag.
Any line that ends with m or g is somewhat likely to be the last line of a parag.
Any line that begins near a star is somewhat likely to be the first line of a parag.

In the file sparags.eva, the first two criteria were always used. The other three were used to guess a few additional breaks.

Here are the parag and word counts:

Code:
  bencao: 370 parags 10930 words

  starps: 333 parags 10474 words

Considering the missing bifolio in the SPS quire, the coincidence of both counts seems quite remarkable.

However we must take into account that

The SBJ file from that webpage must be the "expanded" version created ~1400 CE. The version that would be available to the Author is some "original" version, which was claimed to have 365 paragraphs. So the parag count of that "original" version is still consistent with that of the SPS, but the word count may not match. That "original" Chinese version is now lost, but may survive in other languages.
In the SBJ file, each word (as produced by Google Translate) may be one or two Chinese characters. (I didn't see any three-character words, but may have missed.) On a visual check there seem to be ~6 two-character words per line; so the number of Chinese characters should be about 10930+6*370 = ~13100
On the other hand, when creating the SPS file, both the word gap marker '.' and the possible word gap marker ',' were mapped to '.', so the number of words (as intended by the Author) may be somewhat less than 10474.
On the third hand, it is possible that the Reader pronounced each Chinese word, not each character, as a single word. If that is the case, the 10474 word count of the SPS would be consistent with that of the SBJ above.
On the fourth hand, some of the candidate languages have been described as being "sesquisyllabic" in that some or most words consist of a "weak" unstressed syllable and a "strong" stressed syllable, the former having a simpler structure than the latter. If the SPS is a transcription of such a language, it is possible that the Author wrote each of those two-syllable words as one Voynichese word.

More to follow...

RE: It is not Chinese - Jorge_Stolfi - 16-06-2025

(15-06-2025, 01:18 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.
(15-06-2025, 09:56 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Some of the "bumpiness" features listed by Nick seem to affect a relatively small fraction of the text, and (like transcription errors, or missing fragments on clay tablets) should not be a big obstacle to decipherment.
I don't see what on Nick's list affects a small fraction of the text, except maybe the right justified titles on a minority of folios. The rest of the issues - repetitions and word types varying across different positions of the text - are chronic problems, so chronic it seems something has been done to the plaintext, if there is one.

What I mean is this. The computer transcriptions we have all contain some number of explicit 'unreadable' glyph codes ('*') and must have an unknown number of unmarked transcription errors. The VMS itself surely has an unknown number of errors that the Author committed while writing the draft (where he meant to write one glyph but skipped, doubled, or wrote the wrong glyph) and more when the draft was copied to vellum (by him or by a distinct Scribe).

But as long as those gaps and errors are less than 90% of all glyphs, they will only make the decipherment harder, not impossible.

It seems pretty certain that the one-leg gallows are just alternatives to the two-leg ones, possibly in combination with other letters, that are used almost only on the first line of each paragraph. Until we figure out exactly what the rules are, we can replace all one-leg gallows, anywhere, by '*'.

Ditto for the m and g glyphs that seem to be extra common at the ends of lines and/or paragraphs. Maybe they are features of the language, like "-す" and "-た" in Japanese. Maybe they are abbreviations that the Scribe was allowed to use when he got close to the right margin and there was only one or two words remaining in the paragraph. Whatever. Just replace every 'm' and 'g' by '*'.

Then replace all weirdos and very rare characters by '*'.

Then discard all labels, and isolated glyph tables and sequences.

Those changes will add '*' to a relatively small percentage of the text, say 10% of all words.  What remains still has hundreds of stretches of several dozen consecutive words without any '*'. Even if the text uses some fancy encryption and/or is in some "exotic" language, those fragments should be enough to decipher it. I cannot imagine an encryption scheme that would be viable for a text of that size and could have been conceived in that epoch, but cannot be deciphered without knowing every character of the text.

And in fact we know that the Zipf plot and word entropy of the text are similar to those of natural languages. Even if the text is encrypted, these results indicate that the encryption is one-to-one on the lexicon (the set of all words). That would exclude Vigenère-type ciphers, unless the key is synchronized at each word boundary.

So, are those fragments still "bumpy"?

Quote:Line patterns at different positions of the text are serious problems for any idea that we are only seeing a natural language.

Considering the above conjectures for one-leg gallows and m/g, I don't think they are a significant problem.

Quote:I find it interesting in that it could imply that if we have copying scribes following a layout by an "Author", they have agency in terms of how they lay out the text: they are not obliged to match their line starts with the Author's line starts, nor their line ends with the Author's line ends by cramming in text or widening spaces between words to make the ends match.

This is almost certainly the case, even if the text was encrypted. The parag breaks would (usually) be clear in the draft, and the Scribe would have to respect them. However, within each parag, the line breaks in the draft would be determined by the width of the paper and the size of the Author's handwriting, and would not be significant. The line breaks in the vellum version would be expected to be different, and neither would have any significance (except that they would count as word spaces).

Said another way: if line breaks were significant (as in a poem, song, list, etc.) we would expect the lines to have variable lengths. The fact that they are all of about the same length, except at the ends of paragraphs, is strong evidence that the line breaks were chosen "on the fly" wherever the writing reached the right margin.

Quote: Given how word types seem to undergo complex mutations at these positions

I question whether there are "bumps" at line breaks. If we look only at line breaks within a parag, excluding the parag breaks, do we still see such anomalies? Are they statistically significant?

Quote:By Top Row, I mean the first line of each paragraph but with its first word and last word omitted so as to isolate a top row effect from separate paragraph/line start effects and line end effects.

This is not sufficient, since the second word on the first line of a parag may have one-leg gallows, or may be special in some other way. To look for line break anomalies, one should exclude the first line of the parag entirely, and look only at internal breaks.

RE: It is not Chinese - Aga Tentakulus - 16-06-2025

What is your opinion on the ratio of characters between Chinese and VM?
According to Wikipedia, there are currently around 100,000 Chinese characters.
I assume that around 1400 there were around 20,000 (pure estimate), based on research. In 1200 BC, there were around 5000.
How do you reconcile the difference in quantity on your own? A simple translation cannot work. European script alone would be more likely.

RE: It is not Chinese - oshfdk - 16-06-2025

(16-06-2025, 08:25 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.What is your opinion on the ratio of characters between Chinese and VM?
According to Wikipedia, there are currently around 100,000 Chinese characters.
I assume that around 1400 there were around 20,000 (pure estimate), based on research. In 1200 BC, there were around 5000.
How do you reconcile the difference in quantity on your own? A simple translation cannot work. European script alone would be more likely.

If you use a variable length encoding (for example, one Chinese character is a Voynichese word, or one slot of a Voynichese slot grammar is a Chinese character), then I see no problem.

There are only 1182 unique characters in the web version of that Chinese manuscript. There are definitely more than enough word types in the Voynich MS to cover this.

If I'm computing this correctly, the single character Shannon entropy of the Chinese manuscript is 8.15 bits per character, while for Voynichese (is we count spaces as tokens, but merge comma spaces and dot spaces together) it's 4.06 bpc.

If Voynichese was invented to specifically record that Chinese manuscript, then 2 Voynichese characters could be enough to encode one Chinese character, assuming spaces can be used to assign different meanings to, say, "olor" and "ol or".

I think this won't work not because it's impossible to efficiently encode Chinese using Voynichese symbols, but because the mid-level structure on the scale of 5-10 Chinese characters is very visible in the Chinese MS (as it normally is in any normal plaintext in any language), and there is nothing similar in the Voynich MS. For me this speaks clearly against all possible interpretations of the Voynich MS as an ordinary plainttext (not the Catalogue of Ships type) expressed in an invented script, or via one-to-one substitution. There is something that destroys this structure. A one-to-many cipher would explain this easily.

RE: It is not Chinese - Aga Tentakulus - 16-06-2025

Filename: Warum xxx.png Size: 159.87 KB 16-06-2025, 09:10 AM

Now imagine that the VM is a combination system like in my example. (From We Learn Voynich 2014)
What does it look like now?

RE: It is not Chinese - oshfdk - 16-06-2025

(16-06-2025, 09:11 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.Now imagine that the VM is a combination system like in my example. (From We Learn Voynich 2014)
What does it look like now?

This is an interesting system, but in order to verify that it decodes Voynichese a fragment of ~100 consecutive characters decoding to a meaningful phrase would be necessary. It's statistically easy to invent a mapping of glyphs that would produce a snippet of 2-3 words in some language from some random place in the manuscript. I think there was some relevant information in this thread: You are not allowed to view links. Register or Login to view.

RE: It is not Chinese - Mauro - 16-06-2025

The prior probability of VMS being 'Chinese' (or some other Far East tonal language) is very low. Yes, it's possible, but not probable at all, so a strong evidence is needed to overcome the prior probability of 'Chinese'.

But the clearest evidence we have (as pointed out by @oshfdk) are the illustrations, where nothing resembles anything oriental but fits well with European Middle Ages. This is fully expected under the hypothesis VMS in an European language, weird and improbable if the VMS is 'Chinese', which further decreases the (posterior) probability of 'Chinese'.

Yes, the two red weirdos of You are not allowed to view links. Register or Login to view. could be upside-down miswritten 'Chinese' characters, but they could also be anything else. Adding them to the Chinese theory further decreases its probability, because now the odds must be multiplied by the probability the two weirdos are actually Chinese signs, which is surely less than 100% (having to compete with similarly-looking European signs and being reliant on two more hypothesis: they are upside-down, and they are miswritten).

Also I have some doubts on the details of the Chinese theory. I re-read post #15 and (if I understood correctly) it implies the VMS was dictated by a local 'Chinese' speaker (who could read aloud the source books) to an European who had travelled to 'China', mastered the spoken but not the written language and desired a copy of the original book(s). But this means the VMS was written in 'China', and for what I know (I can be wrong, of course, not my field!) in the Far East vellum was never used (no clue about gall ink and the pigments).

With all the respect due to Stolfi, I don't find the 'Chinese' hypothesis to be probable. At a minimum it needs quite a strong more evidence (having to overcome the prior and the evidence currently available).

RE: It is not Chinese - Jorge_Stolfi - 16-06-2025

(16-06-2025, 09:11 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.Now imagine that the VM is a combination system like in my example.

Indeed the Author clearly designed the script by combining a small set of simple strokes in (almost) all possible ways:

Filename: page14.png Size: 23.6 KB 16-06-2025, 11:12 AM

My guess is that the motivation was to optimize the speed of writing. This seems to be a common feature of stenography (not "ga"!) schemes.

All the best, --jorge

RE: It is not Chinese - oshfdk - 16-06-2025

(16-06-2025, 11:18 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Indeed the Author clearly designed the script by combining a small set of simple strokes in (almost) all possible ways:

My guess is that the motivation was to optimize the speed of writing. This seems to be a common feature of stenography (not "ga"!) schemes.

There have been anecdotal reports that Voynichese is not very convenient to write, at least not with a quill. I tend to agree, though my personal experience with quills is limited. The most obvious examples are the gallows. If these are just 4 characters, then using a simple v shape rotated in 4 cardinal directions would create a much simpler script, and in the extremely rare cases where proper v is used, it could be replaced with a more elaborate symbol.

I think someone in the past proposed that the shapes of the glyphs reflect the phonology in the way similar to Korean You are not allowed to view links. Register or Login to view., where "the letters for the five basic consonants reflect the shape of the speech organs used to pronounce them". Hangul is roughly contemporary with the Voynich Manuscript, so the idea certainly wouldn't be much ahead of its time.