The Voynich Ninja
It is not Chinese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Voynich Talk (https://www.voynich.ninja/forum-6.html)
+--- Thread: It is not Chinese (/thread-4746.html)

Pages: 1 2 3 4 5 6 7 8


RE: It is not Chinese - Pepper - 12-06-2025

(12-06-2025, 10:19 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(12-06-2025, 10:04 AM)Pepper Wrote: You are not allowed to view links. Register or Login to view.I think you mixed our names up but if I read this right yes, that's what I meant.

And that is also what I meant. 'Forced by circumstances'.

What the Voynich authors did has all the looks of something unusual, no matter what it was based on.
(My opinion of course).

For sure, no argument with that.


RE: It is not Chinese - MarcoP - 12-06-2025

The VIQR files that Stolfi mentioned on his page can be recovered from the Wayback Machine:

You are not allowed to view links. Register or Login to view.


For this file (Genesis, if I understand correctly) You are not allowed to view links. Register or Login to view.

I get a conditional entropy of 2.49 (I removed all uppercase headers and html tags, and converted to lowercase).


RE: It is not Chinese - kckluge - 12-06-2025

(12-06-2025, 10:04 AM)Pepper Wrote: You are not allowed to view links. Register or Login to view.
(12-06-2025, 09:43 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.I think you're misreading Rene's use of "forced" there. He's not using it in the sense of "someone put a gun (or crossbow :->) to their head", he's using it (I think) in the sense of "none of the Jesuits (say) who worked up Romanization schemes for tonal Asian languages found the Latin alphabet so inadequate for the purpose that they were motivated to invent a new script."

By the way, welcome back.

I think you mixed our names up but if I read this right yes, that's what I meant.

Oops...mea culpa.


RE: It is not Chinese - Koen G - 12-06-2025

Marco: Those are interesting entropy values. Do you know how the diacritics are handled? Does entropy increase or decrease when they are removed? What happens when we plot h2 against h1?

Regarding the other discussion: I obviously agree with the sentiment that the Voynich manuscript did happen and it is without parallel, so whatever caused it to happen must also be an unparalleled combination of factors. However, this does not mean that antecedent should be disregarded altogether. Languages like French, English, Dutch, German... were all forced into the constraints of the Latin alphabet, even though their phoneme inventory is vastly different. We made do. And throughout history, the rule has been that whatever language we encounter, we attempt to write in our familiar script. Missionaries typically did not go and invent an entirely different writing system for each indigenous language they encountered.


RE: It is not Chinese - nablator - 12-06-2025

(12-06-2025, 11:03 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.For this file (Genesis, if I understand correctly) You are not allowed to view links. Register or Login to view.

I get a conditional entropy of 2.49 (I removed all uppercase headers and html tags, and converted to lowercase).

I get 2.72 or more, even when I remove all non-alphabetic characters:

ban ddau dduc chua troi dung nen troi ddat
va ddat la vohinh va trongkhong su motoi o tren ma t vuc than dduc chua troi vanhanh tren ma t nuoc
dduc chua troi phan ra ng phai co su sang thi co su sang
dduc chua troi thay su sang la totlanh ben phan sang ra cung toi
dduc chua troi dda t ten su sang la ngay su toi la ddem vay co buoi chieu va buoi mai ay la ngay thu nhut
dduc chua troi lai phan ra ng phai co mot khoangkhong o giua nuoc dda ng phanre nuoc cach voi nuoc
...

You are not allowed to view links. Register or Login to view.


RE: It is not Chinese - MarcoP - 12-06-2025

I guess the difference could be that I am considering spaces as separators and bigrams with spaces are not included in the computation. For "in the sun", only 5 bigrams are counted: in th he su un

For King James Genesis, I get conditional entropy 2.9

I attach 20K from the VIQR cleaned-up file. h1 and conditional entropy:

viqr______________ 4.64427 2.49107
kj_genesis________ 4.10505 2.91232
viqr_space_as_char 3.88488 2.71326
kj_space_as_char__ 4.00070 3.10427


I'll leave it here, for more skilled people to consider in case they are interested.


RE: It is not Chinese - ReneZ - 13-06-2025

(12-06-2025, 01:21 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.What happens when we plot h2 against h1?

For that, see here: You are not allowed to view links. Register or Login to view.
and scroll down to the second colourful scatter plot.
The blue open circles near the bottom are Voynichese.
The red triangles nearest these are East-Asian, as explained immediately below the figure.
Conditional h2 is close to 2.4

Unfortunately, I did not save the link to the source texts (UDHR), as it appears to have changed.
This is the link given in Hauer and Kondrak, which I used then:   You are not allowed to view links. Register or Login to view.

I still have the texts, and it should be possible to find a working link,


RE: It is not Chinese - Jorge_Stolfi - 13-06-2025

(12-06-2025, 06:51 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Stolfi, have you actually measured the entropy of monosyllabic languages? How do they need to be written down to achieve a conditional character entropy that's well under 3, while at the same time working with a very limited alphabet?

Rene and others have re-posted the analyses.  As I can see, indeded the character entropies of Mandarin in Pinyin with numeric tones and Vietnamese in VIQR come close to those of Voynichese.

But let me insist that the character entropy is not an interesting measure, because it is highly dependent on the encoding of sounds into characters, and in particular on what one considers a character.   The h_k entropies of German would be higher if one used "x", "k", "ö" etc instead of "sch", "ck", "oe", etc.  Conversely the h_k entropies of Vietnamese and Mandarin would be higher if one encoded the diacritics with the main vowel as a single character, instead of 2-3 characters like VIQR and Pinyin do.  In the VMS, are qo, ee, iin, cTh single characters, or clusters of separate characters?  The choice will affect the h_k entropies.

The word entropy is a more useful measure, because it does not depend on the encoding or encryption, as long as it is one-to-one (such as a simple substitution cypher, Pig Latin, or writing words bacwards).  

An encoding that is many-words-to-one-word (such as writing Mandarin or Vietnamese without tone information) would decrease the word entropy.  An encoding that is one-word-to-many-words (such as a Vigenère cipher) would increase the word entropy, and would also mess up the Zipf plots.  Since the word entropy and Zipf plots of Voynichese are quite "natural", it is still quite possible that it is a natural language with an encoding that is somewhat verbose but still mostly one-word-to-one-word.

By the way, the h_k entropies (uncertanty of next character given the previous k characters) are not the best ones.  More useful would be e_n = the entropy at position [n] in the word, given all the previous n characters of the word.  The encoding will affect this metric too, but since the sum of all e_n is the (zero-context) word entropy, it can only affect how this total is spread along the positions n.


RE: It is not Chinese - nablator - 13-06-2025

The conditional character entropy is not the only puzzling statistic. What about the general drift and many known weird jumps in paragraphs/pages/sections? These are more consistent with a poorly done imitation/parody/fake than an honest transliteration of a language.


RE: It is not Chinese - Letieum - 13-06-2025

(13-06-2025, 03:28 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.These are more consistent with a poorly done imitation/parody/fake than an honest transliteration of a language.

"consistent with a fake" maybe, but not "poorly done" ...
If the VM is the result of some sort of "gibberish-generating process", it is a a surprisingly complex process for its time, with a clever mix of rules and randomness !