![]() |
[split] The Zipf law and the Voynich Manuscript - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: [split] The Zipf law and the Voynich Manuscript (/thread-1555.html) |
RE: [split] The Zipf law and the Voynich Manuscript - Juan_Sali - 13-08-2025 To apply The Zipfs law to the VMS vords it is needed an starting hypothesis that is not usually mentioned: vords of vochinese are equivalent to words in a natural language. If they are not the Zipfs law cant be applied. If empirical data dont support the Zipfs law it doesnt mean that there is no language under vochinese. If empirical data support the Zips law it doesnt mean that there is a language under vochinese, Zipfs law is a necessary condition for a language, but not a sufficient condition. RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 13-08-2025 This assertion that the Voynich script follows Zipf's law is starting to seem like dogma to me. It's like the idea that there's a biological or balneological section in the Voynich, something that must be accepted as a matter of faith. The curious thing is that if the data does not support the theory that the script follows Zipf's law, then the data is discarded and the problem ends. Anyone familiar with Zipf's law for natural languages can see that it doesn't fit with the frequency of glyph groups we see in the Voynich. Following the most frequently used group, daiin, with 864 occurrences, are five groups in descending order, ranging from 538 to 396 occurrences. This doesn't fit with the theory at all. The problem with Voynich research is that it continues to claim things that have no real backing. Although I fear that dissenting won't help much, since there will always be those who say he already told the truth about the VM script and Zipf's law in Cryptology magazine many years ago. RE: [split] The Zipf law and the Voynich Manuscript - Jorge_Stolfi - 13-08-2025 (13-08-2025, 08:42 PM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.This assertion that the Voynich script follows Zipf's law is starting to seem like dogma to me. No, it is an empirical observation (that is in fact a problem for many theories). Many natural languages deviate from Zipf's law in the first most common words, for various reasons. For instance, the most common words in Romance languages include the definite articles, In most of these languages there are four articles (masc and fem, sing and plural). Their frequencies are affected by the frequencies of genders and numbers in the language's lexicon. But different Romance languages, even closely related ones, often assign different genders for the same noun. Moreover, in Italian there are more than four articles, because one has different forms depending on the initial of the next word. This alone means that Italian and Spanish cannot both follow Zipf´'s law. Here are some actual Zipf plots that I prepared years ago for my 2012 presentation at Mondragone: (Roadside Picnic is a sci-fi novel in Russian): English prose texts in fact are exceptional because they do follow Zipf's law with remarkable accuracy, even at the high-frequency end. This is due to Englsh having lost the notion of noun gender and the need to inflect articles and adjectives for number,so there is only one definite article. Medieval English however already deviates from the ideal law; I did not check why exactly. But a complicated cypher that maps the same word type to different word types, like the Vigenère ciphers, can completely unzipfy the plot: On the other hand, the East Asian languages with monosyllabic "words" deviate significantly from the ideal at both ends, high and low frequency: And Voynichese deviates from the ideal only as much as some of other natural languages: RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 13-08-2025 This paragraph is from Torsten in this same thread: "The usage of [daiin] is not consistent with a common word like "the", "and", "for" or "to". There are pages full of text without any [daiin] like f75r, f79v, You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. (see You are not allowed to view links. Register or Login to view.). There is also a page like You are not allowed to view links. Register or Login to view. with only 57 words but with 11 instances of [daiin]. Moreover on most pages [daiin] do co-occur with similar words like [aiin], [dain] or [ain] (see You are not allowed to view links. Register or Login to view.)". If it's supposedly a language, how is it possible that the most frequently used word in the Voynich (daiin) is missing from several pages? Presumably, as in English or any other language, the most frequently used word in a text will be an article, a preposition, or a conjunction. Aren't there any of those grammatical elements in the pages cited by Torsten? RE: [split] The Zipf law and the Voynich Manuscript - Jorge_Stolfi - 14-08-2025 (13-08-2025, 11:01 PM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.If it's supposedly a language, how is it possible that the most frequently used word in the Voynich (daiin) is missing from several pages? Statistics like word and character frequencies are not propertiesy of a language, but of a text. In some texts the most common word may indeed be some word that is rare in others. In English romances and treatises, like tje Culpeper Herbal, the most common word is "the". But in the Medieval Towneley Plays, "the" is only third place, after "I" and "that". The frequency of "I" in Culpeper is only 0.003, so that there must be many pages without a single "I". In an English novel or treatise the word "is" must be more common than "was". In a chronicle it may be the opposite. Quote:Presumably, as in English or any other language, the most frequently used word in a text will be an article, a preposition, or a conjunction. Aren't there any of those grammatical elements in the pages cited by Torsten? Articles are a weird feature of only some European languages. Latin, most Slavic languages (Russian, Czech, Croatian, ...) Finnish, Estonian, Latvian, and Turkish have no articles. Bulgarian and Hebrew have definite articles but they are attached to end of the word. Arabic has a definite article but it is attached in front of the word. And East Asian languages (Tibetan, Burmese, Thai, Vietnamese, Chinese...) do not have articles either. As for prepositions, Latin and many other languages use them sparsely, and inflect the words into "cases" instead. In some (if not all) East Asian languages the words that would be translated into English prepositions are also used in other functions, like verbs or adjectives. In Turkish, Japanese, and Korean the prepositions are not separate words but suffxes attached to the end of words. Here are the 7 most common words in some texts I happen to have: Code: Russian novel Roadside Picnic: (My Chinese texts use uncommon encodings; the Chinese characters and pinyin above were obtained via ChatGPT, so readers beware..) All the best, --jorge RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 14-08-2025 I congratulate you on your varied linguistic knowledge, but we're talking about a document from medieval Europe, and it doesn't seem to make much sense to talk about other geographical environments. You have not answered why in the SAME text the most frequent word is absent from some pages and what kind of grammatical element 'daiin' is. I've already said that the data doesn't support the script following Zipf's law. The third most common word, 'chedy', appears 501 times, which is more than double the theoretical prediction. This isn't a small deviation, and the same goes for the fourth word. RE: [split] The Zipf law and the Voynich Manuscript - Stefan Wirtz_2 - 14-08-2025 (14-08-2025, 09:47 AM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.I congratulate you on your varied linguistic knowledge, but we're talking about a document from medieval Europe, and it doesn't seem to make much sense to talk about other geographical environments. Russian, Czech, Croatian, ..., Finnish, Estonian, Latvian, Turkish, Bulgarian, Hebrew and, with limits, Arabic are or were present in wide areas of Europe during the 15th century and before. Added with Mongolian, and it is quite possible that you can count all caucasian languages (which is an own world of languages itself) as well into the options. VMS alphabet reminds a bit of some georgian letters, for example. It is still on thin ice to say the VMS is "a document from Europe", even though there are a few hints relating it to european customs -- but you even cannot exclude a small group of exotic traders who produced and used the VMS, coming from somewhere else. Personally I would exclude a japanese, vietnamese, chinese or tibetian kernel here, but just because these are languages with a fully developed writing system in 15th ct already (which counts also for any Latin and Greek relatives) and there was no urge to develop a new alphabet for the very economical/discounting VMS. RE: [split] The Zipf law and the Voynich Manuscript - Antonio García Jiménez - 14-08-2025 Let's not digress and get back to the topic of this thread. Anyone with word frequency data for the EVA transliteration can verify that the Voynich script does not follow the classic Zipf law pattern. The frequency falls very slowly and the resulting curve is quite flat, not at all what theory predicted. The Voynich script, therefore, does not behave like a natural language. It could be said that the script follows a flat Zipf, but that only demonstrates the prejudice of wanting to assimilate the Voynich script to a natural language. Of course the script has a structure, but not the one that corresponds to a natural language. RE: [split] The Zipf law and the Voynich Manuscript - Stefan Wirtz_2 - 14-08-2025 (14-08-2025, 03:43 PM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.Let's not digress and get back to the topic of this thread. That's right where we are, at the topic: you are mixing "Zipf law and the VMS" with Zipf & Voynichese, which is a language nobody knows yet. Apart from the need of fully written numbers in the script, there might be also written a substantial amount of - verses / poems - prayers - song texts - tales with imaginable high counts of rhymes, repetitions and similarities. What would be the Zipf result or "entropy" for a "Pater Noster", the "rosary" or several more simple songs? Could you judge the whole nature of a language under such conditions even upon a long book text? And one more: the VMS is visibly not made in a quality good enough for royals, higher scholar or clerics. At a time when most people had none or just a few years school education and the most inhabitants of Europe could not or just hardly read, what consequences would that have for a book, targeted to adult readers of the second or third line best, who maybe had not more than ~4 years schooltime? Easy language, simple wording, often repeated? If you want to compare the VMS, in this case it may be compared best to today's 5th grade school books and children's books. What are Zipf and entropy for modern 5th class books...? And what would that say about the underlying language? Or in short: I can not give very much upon such results... RE: [split] The Zipf law and the Voynich Manuscript - R. Sale - 14-08-2025 Comparison with 'normal' text may not be valid if the VMs is not a "NORMAL" text. Suppose the VMs author had strong religious beliefs, and s/he would go off on these rants, where everything is either holy / blessed or it is damned / cursed. A few dozen pages like that would shift the statistics. PS: Better yet, perhaps these are the standard descriptive terms: hot, cold, wet and dry. They would have been used a lot in botanical descriptions. |