The Voynich Ninja
It is not Chinese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Voynich Talk (https://www.voynich.ninja/forum-6.html)
+--- Thread: It is not Chinese (/thread-4746.html)

Pages: 1 2 3 4 5 6 7 8


RE: It is not Chinese - ReneZ - 11-06-2025

There is a certain amount of déjà vu in this  Smile

(11-06-2025, 03:29 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And there is no reason to assume, even within the "Chinese" theory, that Voynichese was Mandarin.


Absolutely. Nowadays Mandarin and Cantonese are mutually unintelligible but I understand that they originate from a common ancestor. I have tried to find out more about that, in particular how it was around the 15th century, but did not find very much. The main point remains that there are indeed very many monosyllabic languages from that part of the world.

In the example I gave using daiin and chol, I used Mandarin because I know bits and pieces of it, and it is easy to verify for people who are interested.

(11-06-2025, 03:08 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Transcriptions of tonal languages that omit tones are usually limited to person and place names, or isolated words like "feng shui" and "kung fu".  It would be totally pointless to transcribe whole sentences that way.

While that sounds completely logical, reality can be quite different. The official romanisation of Thai does not write out any tone information. People also sometimes refer to 'Karaoke language' which is more freely romanised Thai but still without tones. While that is OK for Thai, it might not work with Mandarin. 

When I wrote before...

(11-06-2025, 01:30 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.An alphabetic writing system for Thai arose in the 15th century, according to tradition.
This only indicated tones in specific cases.

... I was referring to the official Thai writing like this: ภาษาไทย

For this language, the tones can be inferred from the spelling, using quite intricate rules, so it is certainly a special case.

On this page: You are not allowed to view links. Register or Login to view.

I have one example from a Romanised Chinese language or dialect (Minjiang) about which I know nothing specific. It has a low entropy, but not a comparable bigram distribution to Voynichese.


RE: It is not Chinese - Jorge_Stolfi - 11-06-2025

(11-06-2025, 09:17 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is a certain amount of déjà vu in this  Smile

Indeed!  Big Grin 

Back in the SSG times, when I proposed this theory on the mailing list, friends like Rene politely informed me that it was nonsense; while the others reacted as if I had claimed that the government put fluoride in the vaccines to make us think that the Earth is round.   

But that is not why I stopped working on the VMS.  On the contrary, I sopped because I became convinced of that theory -- but it meant that I did not have the knowledge needed to make further progress, and had no hope of acquiring it.

Quote: The official romanisation of Thai does not write out any tone information.

But what is that romanization used for?   Anything more substantial than person and place names?

Quote: I have one example from a Romanised Chinese language or dialect (Minjiang) about which I know nothing specific.

AFAIK Taiwan still uses You are not allowed to view links. Register or Login to view., a romanization that encodes the tones by varying the spelling of vowels and consonants, without diacritics.

Quote:It has a low entropy, but not a comparable bigram distribution to Voynichese.

Again, character entropy is problematic.  The word entropy is more interesting: it should be the same for any spelling -- as long as word breaks occur in the same spaces, and distinct words remain distinct.[/quote]


RE: It is not Chinese - Jorge_Stolfi - 11-06-2025

(11-06-2025, 06:22 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.AFAIK Taiwan still uses You are not allowed to view links. Register or Login to view., a romanization that encodes the tones by varying the spelling of vowels and consonants, without diacritics.

And the common Tibetan script (based on an ancient Indian syllabary) uses a prefixed letter to encode the tone.  The scheme is retained in some romanizations; that is why one sees names like "rLobsang": the "r" is a tone marker.


RE: It is not Chinese - Koen G - 11-06-2025

I agree that word entropy isn't a huge issue, but there is a reason we keep hammering on character entropy. This is the level where the true challenge lies. So far, nobody has been able to make a good case for any language. 

I don't think a point can be made in favor of a European romanizing an Eastern language, or any language for that matter. And that for the simple fact that Voynichese isn't a romanization. Either you learn the original script, or you use the script you know to represent the sounds of the foreign language. Voynichese is the result of neither. It is loosely based on medieval Latin script and some numerals, abbreviations... But there is no way this would have been the spontaneous result of Latins representing a foreign language in a more accessible way.


RE: It is not Chinese - Jorge_Stolfi - 11-06-2025

OK, it seems I must flesh out my version of the "Chinese" theory.

The Author and his project

The "Chinese" theory is not mine, really.  Jacques Guy entertained it briefly before dismissing it. But it is actually the oldest theory of them all, older than Raphael Mnishowski's "Bacon through Rudolph" guess that got Voynich reeling.  It was proposed by Georg Baresch/Jiří Bareš in his 1639 letter to Athanasius Kircher:
  • "From the pictures of herbs, of which there are a great many in the codex, and of varied images, stars and other things bearing the appearance of chemical symbolism, it is my guess that the whole thing is medical, the most beneficial branch of learning for the human race apart from the salvation of souls. This task is not beneath the dignity of a powerful intellect. [...1...] In fact it is easily conceivable that some man of quality went to oriental parts [...2...]. He would have acquired the treasures of Egyptian medicine partly from the written literature and also from associating with experts in the art, brought them back with him and buried them in this book in the same script. This is all the more plausible because the volume contains pictures of exotic plants which have escaped observation here in Germany." [tr. by Philip Neal]
The "[...1...]" and "[...2...]" are two details of Baresch's theory that I do not subscribe to:
  • [...1...] = "After all, this thing cannot be for the masses as may be judged from the precautions the author took in order to keep the uneducated ignorant of it."
 I don't think that the Author tried to make the books unreadable.  (However, he may have disguised the organs in the Biological section as baths in a lame attempt to escape the unwanted kind of attention.)
  • [...2...] = "[went to oriental parts] in quest of true medicine (he would have grasped that popular medicine here in Europe is of little value)"
I don't think the VMs was the goal of his trip.  In my view, the Author was fairly educated but not necessarily any kind of scholar (professor, physician, alchemist, astrologer etc.) My guess is that he spent a few years in some "oriental parts" as a merchant or some supporting role thereof, like interpreter, pilot, guide, etc. 

During that stay he would have acquired a working knowledge of the local spoken language.  I suppose be also met local scholars who showed him their books, which he could see had intriguing illustrations and were said to be full of knowledge that was unknown in Europe.  So he felt that he had to take those books back home. (Or maybe even he had been asked by some scholar back home to seek and bring back any such books.)

However, the Author could not read the local script, and did not have the time, patience, or confidence to try to learn it.  Therefore, it would be pointless to just acquire copies of those books, or copy them in the local script: he would never be able to read them. 

He may have thought of getting some local person (the "Reader") to read them aloud, while he wrote down a translation in Latin or some other language he knew.  But that would not have worked either, because the texts were full of specialized terms and contrived grammatical constructions which he did not understand and/or did not know their Latin equivalents.  The Reader would have had to spend a lot of time explaining the meaning of those terms and sentences to him, while he figured out some suitable paraphrase in Latin.  And the Reader would have had to be a scholar who understood the book, not just anyone who could read the local script.

So his only viable option was to write a phonetic transcription of what the Reader was saying, in the hope that later -- there, or back at home -- he could somehow learn or deduce the meaning of the incomprehensible parts.

However, he found that the alphabets which he knew, Roman or other, were not adequate for this task -- perhaps because the language had tones, or too many distinct phonemes; or perhaps because he felt that it took too long to write.  Thus he devised a totally new script that was faster to write and/or was better fitted to the phonetics of the language. 

Note that most Voynichese glyphs are combinations of two or three simple strokes from a small repertoire, and the main strokes can be written by pulling the pen, rather than pushing it (which could cause it to snag in the fibers of the paper).

Besides the transcription of the text, he roughly copied some of the illustrations from those books.

So that, in my view, is what the main contents of the VMS is: a phonetic transcription of several separate "Chinese" books, or parts thereof, in a script invented for the purpose and designed to be quick to write, almost like a You are not allowed to view links. Register or Login to view. scheme (not to be confused with steganography!).

That task probably was broken into several separate seances, over the course of a few years.  Possibly dictated by different Readers, and/or written down with somewhat different versions of the script.  This may perhaps explain the different "languages" (actually, word frequency distributions) seen in different parts of the VMs.

The Scribe

The Author surely did not write those dictations and copied the figures directly on vellum: it would be a very stupid idea.  Instead he must have written those notes on paper, and presumably corrected, edited, and rearranged them still in that medium. 

At some point, the final draft of the notes were copied to vellum by one or more Scribes.  The Author himself may have been the (single) Scribe.  However, I see several bits of evidence, and a few logical arguments, indicating that the Scribe was a separate person. 

One argument is that the cost of the vellum demanded a Scribe who could write and draw with very fine traces; but not everyone who could write would have such skill.  Moreover, this task was slow, tedious, and mindless work, of the sort that anyone with the means would rather hire out than do himself.

If the Author was not the Scribe, then the Author had to train him/her first, by teaching the letters of the script and having him/her practice by copying them on paper, until the Author was satisfied with his/her accuracy.  However, it is my belief that the Scribe did not know the language at all, maybe not even the phonetic values of the letters.  I could go over the evidence for this claim in other posts.

On the other hand, the Author gave the Scribe substantial freedom about the figures.  The draft may have had only crude sketches of the "nymphs", maybe only squiggles meaning "a human figure goes here".  I believe it was the Scribe who choose to draw each human figure as a nymph, put battlements on castle walls, choose the hairdos, dresses, and hats, pharma jars, etc. (Therefore, by the research of Koen and others, it would follow that the Scribe was from Northern Italy; probably from Genoa or Venice, which were big maritime and mercantile powers in that epoch.)

On the other hand, I see evidence that the draft itself had the bridging gallows at the top of some pages, and the one-leg gallows on the leading line of each parag.  The Scribe may have contributed only some extra flourishes, like those on f42v.

Again, the Scribe was obviously skilled with the sharpening and handling of the pen, so that his Voynichese writing was fairly clean and firm from the start, in spite of its small scale.  I would guess that he/she had substantial previous practice in writing documents in Latin and/or some other language.  On the other hand, he/she clearly had practically no artistic skills, and learned to draw nymphs, plants, etc. "on the fly".  (However, he/she had drawn goats, bulls, and fishes before, whereas he/she did not know what a lion or a scorpion looked like at all.)

As I wrote before, I don't believe the "many Scribes" claims. I could expand my reasons in other threads.  But, for one thing, a single Scribe could be an outsider hired for the task, but he/she could also be a personal secretary, coworker, friend, or relative (e.g. the Author's horny teenage son or little brother); and the clean-copying could have happened while the Author was still in the "distant lands", of after he got back home.  Whereas, if there were many Scribes, the possibilities are much more limited: they would have to be hired outsiders, and then the clean-copying is unlikely to have happened locally.  And those Scribes would all have to be trained in the script ...

The sections

I think that the Herbal section of the VMs is anomalous, and will discuss it separately.  As for the others, I believe that each of them is the transcription of a distinct "Chinese" book, or part thereof.  The Author did not compose any of the text himself, and did not invent any of the diagrams.

Based on the clear evolution on the style and skill of the nymph drawings, I would guess that the Cosmo and Zodiac sections were put to vellum at about the same time, and before the Biological section.  The Pharma section seem to have been "vellified earlier than Biological one, too.  I have no idea about the sequence of the Starred Parags section.

As I wrote elsewhere, I believe that the Herbal section was created last, by copying the drawings of plant parts from the Pharma section and completing each item into a full plant, by adding totally fake details -- possibly made up by the Scribe him/herself.  In that process, some details of the Pharma illustration (like the "platform" roots) may have been mis-interpreted by both Author and Scribe.  The motivation for this exercise may have been to create a derivative work that could be sold for a higher price, for being formatted like a standard herbal.

However, I see evidence that this "Medieval AI" effort was supervised by the Author himself.  I believe that the text of that section is meaningful, although the properties of the plants, presumably listed therein, may have been made up by the Author.

It is unfortunate that the Herbal section has attracted the lion share of VMs studies -- presumably because it is bigger, it is the first one, and gives the impression that it can be the "Rosetta Stone" through the identification of the plants.  If my hunch above is correct, that impression is false and much of the effort that went into it has been wasted. 

Whereas no one seem to care about Starred Parags section -- which I suspect will turn out o be the true Rosetta Stone.  Because I think I know which "Chinese" book it was copied from...

All the best, --jorge


RE: It is not Chinese - Jorge_Stolfi - 11-06-2025

(11-06-2025, 08:49 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I agree that word entropy isn't a huge issue, but there is a reason we keep hammering on character entropy. This is the level where the true challenge lies. So far, nobody has been able to make a good case for any language. 

The VMs character entropy is anomalous only if one compares it to that of languages with many consonant clusters, diphthongs, spelling quirks, etc. which can be freely mixed along the word.   Like all European and American Native languages, Turkish, etc.

However, the character  entropy of languages/scripts with monosyllabic words, like Vietnamese and Mandarin in Pinyin, is anomalously low too.  The syllables typically have up to a dozen slots, and each slot can be empty or filled with only one phoneme out of a small number of choices -- either a small set of  consonants, a small set of glides, a small set of vowels,  a couple of nasals, etc.  And, in a phonetic script, there would be another slot for the tone index, which is usually a property of the whole word, rather than of any specific vowel.   

And that is also the normal structure of Voynichese words.

Therefore, in those languages, the entropy in each slot is very low.  But the character entropy does not drop as one moves along a word, like it does in "European" languages.  That is why the word entropy (which is the sum of the character entropies) turns out to be similar to that of the latter.


RE: It is not Chinese - ReneZ - 12-06-2025

Just to repeat what I said a couple of posts before:

(09-06-2025, 10:06 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.So: statistically this is super interesting. Historically it is very challenging.

This was not meant to refer to the simple example I gave, but to the whole concept of a mono-syllabic tonal language such as these exist in East-Asia. 

There are many pros and cons across the board.

Here, not all of these languages are mono-syllabic and not all of them are tonal. But there are far more than people are often aware. There are dozens to hundreds of so-called 'hill tribes' in the area of S. China and most countries south of that. Most of these have their own language and these are mostly mutually unintelligible. 
However, these are not the most interesting candidates. For the theory to work, the area would have to be well reachable by land or by sea, and represent an 'area of interest' in some way or another.

Anyway, the tones remain a problem, not for the native speakers of course, but for the foreigners.

People who have not learned to use them from infancy need considerable time to 'restore' the capability of properly using them. 

I am reminded of a great anecdote in Formula 1 reporting some 5-10 years ago when two Austrian (?) reporters tried their best to properly pronounce the name of a Chinese driver, by asking him directly. They just got confused, and the problem was easy to understand: they concentrated on how to pronounce the consonant in his name, but the driver must have responded 'right' or 'wrong' depending on whether they got the tone right. Complete disconnect.

Anyway, again, the statistical comparisons are super interesting, but the story what would have happened has major issues, in my humble opinion. And there are still some statistics with issues. I suspect that vowel-consonant separation is one of them, but I have never done a good test. The Asian word structure tends to have relative clear alternation of vowels and consonants and very limited consonant clusters.
This alternation is not at all clear in Voynichese. But that's a bit more of an arm-waving argument than bsed on good evidence.


RE: It is not Chinese - Jorge_Stolfi - 12-06-2025

(12-06-2025, 12:45 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.However, these are not the most interesting candidates. For the theory to work, the area would have to be well reachable by land or by sea, and represent an 'area of interest' in some way or another.

Agreed.  But those criteria are broader than one may think.   "Reachable by land or by sea" only excludes jungle areas.  In the 1600s or thereabouts, a small obscure island in Indonesia became "an area of very high interest" because it was the only source of nutmeg. 

Quote:Anyway, the tones remain a problem, not for the native speakers of course, but for the foreigners. People who have not learned to use them from infancy need considerable time to 'restore' the capability of properly using them.

Indeed!  I know in theory what the four tones of Mandarin should be like, and I can tell the difference between "shí" and "shì" if said next to each other; but cannot tell which is which if I hear them in isolation.  

And I lived 13 years in the US, but yet I cannot tell "man" from "men" unless they are said next to each other.  The phonetic pre-processor in my brain still maps both vowels to the Portuguese "é" (Italian/French "è") before sending the word to the lexical lookup engine...

But someone who lived in "China" long enough to find the VMs source books and transcribe them must have gained a minimal mastery of the tones.  Unlike the distinction of "man" x "men", distinguishing the tones would be essential to any understanding.


RE: It is not Chinese - Yavernoxia - 12-06-2025

The whole chinese story post was great to read, thank you very much Stolfi. It’s always a pleasure to read the theories of an old timer in the Voynich studies.
(11-06-2025, 11:17 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Whereas no one seem to care about Starred Parags section -- which I suspect will turn out o be the true Rosetta Stone.  Because I think I know which "Chinese" book it was copied from...

All the best, --jorge

And which book would that be?  Big Grin


RE: It is not Chinese - Koen G - 12-06-2025

Stolfi, have you actually measured the entropy of monosyllabic languages? How do they need to be written down to achieve a conditional character entropy that's well under 3, while at the same time working with a very limited alphabet?