(17-08-2025, 02:26 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.Jorge Stolfi and others have proposed that the unusual properties of Voynichese stem from a one-off attempt to develop a phonetic writing system for an East Asian language. Can we define a representative list of possible phoneticization schemes and then randomly iterate phoneme-glyph mappings for a variety of East Asian languages, to see under what conditions we get more VMS-like text?
It sounds easy, but...
The candidate East Asian languages -- those where basic words are single syllables -- would include Tibetan, Thai, Burmese, Vietnamese, Lao, Khmer, Hmong, Mandarin, Cantonese, and a couple dozen other languages, mostly in China. At least half a dozen of these have documented cases of European travelers or merchants, like Marco Polo and Willem Van Ruysbroeck, spending years in their domains before 1400; and there must have been hundreds of others which we don't know of, and hundreds of travelers from Arabia, Turkey, Persia, etc. Even if we restrict the list to languages that are likely to have been learned by an European, we still have at least half a dozen candidates.
All those languages are more different from each other than Swedish is to Spanish. Each has its own syllable structure, with its own set of tones. The tone of a syllable is a pattern of variation of pitch (or other sound quality) as the syllable is spoken; the same consonants and vowels said with different tones are completely different words. Mandarin has only four tones (or five, depending on how one counts); Cantonese and Vietnamese have six, some of those languages may have 8 or more. The tone of a syllable is not a property of a particular vowel, but of the syllable as a whole; thus different phonetic renderings may place the tone marker in different places like
hỏi or hoi4 or 4hoi, or use two or more symbols to indicate pitch, e.g.
3h1oi3 to mean mid-then-low-then-mid pitch pattern.
And then there are complex rules that change the "normal" tone of a syllable depending on the adjacent syllables. For example:
- In Taiwanese, within a You are not allowed to view links. Register or Login to view., all its non-You are not allowed to view links. Register or Login to view. syllables save for the last undergo tone [modification]. Among the unchecked syllables, tone 1 becomes 7, 7 becomes 3, 3 becomes 2, and 2 becomes 1. Tone 5 becomes 7 or 3, depending on dialect. Stopped syllables ending in ⟨-p⟩, ⟨-t⟩, or ⟨-k⟩ take the opposite tone (phonetically, a high tone becomes low, and a low tone becomes high) whereas syllables ending in a You are not allowed to view links. Register or Login to view. (written as ⟨-h⟩ in Pe̍h-ōe-jī) drop their final consonant to become tone 2 or 3.
A phonetic script can stabilize the pronunciation of a language for many centuries. That's the case of Italian, for instance: most native speakers can still understand Dante's 13th century poems. But several of those east Asian languages used to be written with Chinese characters, which are not phonetic and therefore allow the pronunciation to change radically, geographically and over time. And we do know that those languages have changed a lot in the past 600 years. There are old Chinese poetry manuals that give examples of syllables that were supposed to rhyme -- but they don't rhyme anymore.
And now suppose that you are an European merchant who has lived in Myanmar or Shanghai for a few years and you have learned the local language well enough to order food and haggle about the exchange rate of muskets vs rubies, but not much beyond that. And, before going back home, you remember the promise you made to your physician uncle to bring him some medical and herbal books from the place. Just copying them would be pointless since neither you nor your uncle can read the native script. For the same reason you cannot translate the books into Latin or your vernacular. Thus the best you can do is devise a phonetic script for the local language and pay a local to read the books aloud while you take dictation. You only understand some of the text, and never heard all those medical terns and plant names; but you hope that, back home, you can use what you know of the language to figure out the rest.
Then how many mistakes would you make (like encoding only 3 tones instead of 5, or conflating -ng with -n) when devising the script, and while taking dictation? (I lived in the US for 13 years, and still cannot hear the difference between the vowels in "man" and "men", unless they are spoken next to each other...)
And now suppose that the local you hired was not as literate as he pretended to be, so that whenever he got to a Chinese character he did not know, rather than say so he would make up a reading at random...
Considering all those complications, I don't see how I could implement your program. I don't know any of those languages, so I would not know how to choose a phonetic encoding that the Author
might have used, nor which statistics could tell whether the guess is right...
So you see why I lost enthusiasm 20 years ago. I still think that can do some progress with that theory or extract some useful properties from the text, even without knowing the language and the encoding. But as for realy "cracking the code", I believe it would have to be someone ho happens to know the correct language as it was
spoken 600 years ago,
and can be motivated to spend a couple of years deciphering the phonetic encoding...
All the best, --jorge