The Voynich Ninja
[split] The Zipf law and the Voynich Manuscript - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] The Zipf law and the Voynich Manuscript (/thread-1555.html)

Pages: 1 2 3 4 5


RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 14-08-2025

I'm having a hard time interpreting the latest replies. The point of this thread is simple: it's about knowing whether the VM script reproduces a natural language and whether or not it therefore complies with Zipf's law.

The problem, as always, is that everything remains unresolved, and people will continue to say that the Voynich script follows Zipf's law when it is clear that it does not.


RE: [split] The Zipf law and the Voynich Manuscript - Jorge_Stolfi - 14-08-2025

(14-08-2025, 01:27 PM)Stefan Wirtz_2 Wrote: You are not allowed to view links. Register or Login to view.Russian, Czech, Croatian, ..., Finnish, Estonian, Latvian, Turkish, Bulgarian, Hebrew and, with limits, Arabic are or were present in wide areas of Europe during the 15th century and before.
Added with Mongolian, and it is quite possible that you can count all caucasian languages (which is an own world of languages itself) as well into the options. VMS alphabet reminds a bit of some georgian letters, for example.

Yep. If we extend the sense of "Europe" to "all countries with a significant Christian presence in the 1400s", then we must include also Albanian (indo-European, but with articles attached to the end of words), Armenian (ditto), Georgian (actually 4 languages - Georgian proper, Svan, Laz, and Mingrelian -- not Indo-Europen and without articles).  I also forgot to mention Irish Gaelic (eight different definite artilcles, written separately now but could be considered part of the next word).  And there may have been other minority languages inside Russia...

All the best, --jorge


RE: [split] The Zipf law and the Voynich Manuscript - Jorge_Stolfi - 14-08-2025

(14-08-2025, 09:47 AM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.I congratulate you on your varied linguistic knowledge, but we're talking about a document from medieval Europe, and it doesn't seem to make much sense to talk about other geographical environments.

Why has there been ZERO progress in deciphering this manuscript in 600 years?

I believe that it is because everybody who could be qualified to decipher it (meaning, people who know something about medieval history, manuscripts, cryptography, etc -- excluding found-about-it-yesterday cranks and ChatGPT zombies) made the same logical mistake.  

They reasoned: "The vellum and ink are European, the general style of the script resembles European scripts, the direction of writing and  paragraph layout is European, the ornate parag headlines are an European thing, the architecture of castles is European, the hairdos, hats, dresses are European, the Zodiac signs are European, the month names are European -- therefore the Author must be European, and the language must be European (or one that was fairly well known in Europe at the time, like Hebrew, Arabic, maybe Turkish)."

But that is a gross logical mistake, a non sequitur.  There is no "therefore" there.

All those facts are obviously true, but they do not imply the conclusion at all

All serious attempts to decipher the "encoding" have taken that wrong turn in the logic, and assumed that the underlying language is "European" in that wider sense.  And since a simple substitution cipher is immediately ruled out, they conclude that it must be some very complicated cipher. (Or gibberish generated by some inexplicably complicated and unnatural algorithm.) 

Well, as you know, I believe that it is in fact plain text, in a straightforward phonetic spelling; but the language is very different from all those "European" languages on almost every grammatical and phonetic axis.  To me, it is no wonder that all attempts to find "European" grammatical patterns have failed.  Like the search for articles and prepositions, or for case/number/gender/tense inflections. 

That, for me, is the reason why this "mystery" has resisted so long.

All the best, --jorge.


RE: [split] The Zipf law and the Voynich Manuscript - ReneZ - 15-08-2025

Antonio, according to your interpretation of Zipf's law, which you expect to hold rather strictly for the most frequent three or four words in a text, essentially NO text in any language is a language.

What Gabriel and Stolfi have long understood, and what Stolfi's graphs show, is that normal texts in real language tend to deviate for the most frequent words. Their frequencies are a bit below the 1/n prediction. 

All of this assumes that the texts are sufficiently long.

In that sense, the Voynich MS text behaves just like real language.


RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 15-08-2025

Rene, I want to be honest because of the high esteem I have for you.

As a humanities scholar, I would never dare challenge someone who knows how to use mathematical formulas. That's why I asked ChatGPT. Artificial Intelligence will never be able to solve the Voynich, but I do trust it to solve mathematical problems.

  What I did was give ChatGPT not the four most frequent words, but the 20 words that appear most frequently according to EVA. The result I got is not a little lower than 1/n, but 0.5, 0.6/n, which results in a semi-flat curve. I asked if the result was consistent with Zipf's law and it replied that the Voynich script is a structured system but not comparable to a natural language.

I invite you to check it out for yourself, and I also invite anyone who wants to resolve this issue in a reasoned manner without preconditions.


RE: [split] The Zipf law and the Voynich Manuscript - Stefan Wirtz_2 - 15-08-2025

(15-08-2025, 07:53 AM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.[..]The result I got is not a little lower than 1/n, but 0.5, 0.6/n, which results in a semi-flat curve. I asked if the result was consistent with Zipf's law and it replied that the Voynich script is a structured system but not comparable to a natural language.
[..]

If you do Zipf to your local telephone book, which surely contains the whole alphabet, numericals and lots of words, it will return "Garcia" and "Lopez" as the most frequent words in whole text probably.
But this does not exclude or confirm Spanish being a natural language.

This example is rather extreme, as phonebooks are very specialized.
But many books are specialized somehow.

So Zipf (or entropy) cannot give the final judgement about an underlying language, just some answers about the text itself.
And these may not help with anything.


RE: [split] The Zipf law and the Voynich Manuscript - Gabriel L - 15-08-2025

Hello Antonio,
The issue (already mentioned in this thread) is that the trend needs to be considered over a large range of ranks and frequencies, not the first x words.
The Cryptologia article used 8213 word types. You are basing your conclusion on the 20 most frequent word types. That is only 0.002 of the available data.
I also invite you to check Zipf's book (The psychobiology of language, available though Google books). Of special interest is plate IV (English & Latin) which shows for Latin exactly the same effect that you now consider to be a reason to reject the rank frequency law.
Regards,

G


RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 15-08-2025

Okay, Gabriel. I'll always trust someone with solid training in a subject more than what Artificial Intelligence tells me.

But you will understand that one is suspicious when Zipf's law says that the third most frequent word appears only a third as often as the first, and yet you see that instead of 288 the result is 501.

I guess you're right and in some way Zipf's law does apply, but I've read somewhere that this law is a case of the higher power law that occurs in many real-world phenomena.

In any case, I think there are enough reasons to suspect that the Voynich script is not linguistic.

Regards


RE: [split] The Zipf law and the Voynich Manuscript - Jorge_Stolfi - 15-08-2025

(15-08-2025, 07:53 AM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.As a humanities scholar, I would never dare challenge someone who knows how to use mathematical formulas. That's why I asked ChatGPT. Artificial Intelligence will never be able to solve the Voynich, but I do trust it to solve mathematical problems.

As a computer science scholar, I must warn you that you should trust ChatGPT (or any other "AI" software) on math even less than on humanities questions.

Those Large Language Models (LLMs) have no intelligence to speak of.  They can evaluate simple algebraic formulas with your numbers; but to decide which formulas they should use they only search their collection of billions (literally) of texts scraped from the internet for words that match your query, and use whatever formula comes up the most in the accompanying solutions.  

To see how badly that approach may fail, check You are not allowed to view links. Register or Login to view..  

Quote:The result I got is not a little lower than 1/n, but 0.5, 0.6/n, which results in a semi-flat curve.

Not sure what you mean by this, but Zipf's law says that the frequency is proportional to 1/n, not equal to 1/n.  In fact the constant depends on the size of the sample, even for the same language and text.  So, if the numbers you got are 0.6/n, they still satisfy Zip'f law.  

Quote:I asked if the result was consistent with Zipf's law and it replied that the Voynich script is a structured system but not comparable to a natural language.

Again, for this kind of question the LLMs will search their database of scraped texts for similar questions, and summarize was seems to be the majority answer in them, possibly weighted by some source reputation index -- without any check of whether it makes sense or not.  So if enough articles out there say "Voynichese is not natural language", that is what ChatGPT will say.

I just watched a video on YouTube where a doctor reported the first person murdered by ChatGPT.  The poor guy had seen several "health influencers" on YouTube claim that it was not the sodium that made salt bad, but the chloride. So he asked ChatGPT what could be a replacement for chloride, and that autogenius explained that bromide is similar to chloride and can replace it for most purposes.  And so the guy bought sodium bromide from some chemical supplier and used it instead of salt on his food, until the sad end.

All the best, --jorge


RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 15-08-2025

I'm not going to argue with Gabriel Landini or you because this isn't my field of knowledge. I just wanted to express my surprise that when I did simple calculations with a pocket calculator of the most frequent Voynich words, I didn't get the numbers predicted by Zipf's law. But I can accept what you both say without any problem.

What I will never be convinced of is that the Voynich script is a linguistic system. Neither European language, nor Asian language, nor language from any other part of the globe.

As you know, I have a thread arguing that script is an iconic code. This isn't the place to go into that, but I just wanted to point out that back in the 1970s, Professor William Bennett compared the entropy of the Voynich to the Hawaiian language, which for me is quite decisive, along with other features that demonstrate the impossibility of a language in the Voynich.