The Voynich Ninja

Full Version: How could you prove that VM is gibberish?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(16-11-2025, 09:32 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.the mental energy and time required to produce the optimal set of filler words might be too costly


If you look at the frequencies of the top words in the Bio B2 pages you will be able to see that the top 8 words make up 20% of the text in that section.

ol is the most frequent word ( 4.4% ). If you believe that only 20% of words are real then you will have to believe that ol must be a filler word, otherwise it would form 22% of real words, an unlikely occurrence. But if it is a filler word it would form 5.5% of filler words, 1 in every 18.

By the same logic chedy ( 3.6% ) must also be a filler word, and form 4.5% of filler words, 1 in every 22.
Then Shedy also, a filler. 1 in every 29.
Then qokedy qokain qokeedy. Most probably fillers also. Each, 1 in ~35.
Together these top 6 words, if they are fillers, will form 22% of all filler words. 1 in ~4.6.

I hope you can now see a bit of a problem. Why is there so much ubiquity of these filler words? Anyone really intent on obfuscating the text would have added more variability. Reusing these words so often couldn't have taken the authors much 'mental energy'.

But also there is a further problem. Something more serious for your hypothesis. If the top words are all filler words then once you have eliminated them the frequencies of the remaining words start to flatten. Among them there isn't any dominant common word, none that occurs significantly more than the next common word. In short, the non-filler words will not follow Zipf's law.

So then, since Zipf's law holds for most natural languages, and it won't under your hypothesis, where is your meaningful text?


[attachment=12430]
(17-11-2025, 01:25 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(16-11-2025, 09:32 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.the mental energy and time required to produce the optimal set of filler words might be too costly


If you look at the frequencies of the top words in the Bio B2 pages you will be able to see that the top 8 words make up 20% of the text in that section.

ol is the most frequent word ( 4.4% ). If you believe that only 20% of words are real then you will have to believe that ol must be a filler word, otherwise it would form 22% of real words, an unlikely occurrence. But if it is a filler word it would form 5.5% of filler words, 1 in every 18.

By the same logic chedy ( 3.6% ) must also be a filler word, and form 4.5% of filler words, 1 in every 22.
Then Shedy also, a filler. 1 in every 29.
Then qokedy qokain qokeedy. Most probably fillers also. Each, 1 in ~35.
Together these top 6 words, if they are fillers, will form 22% of all filler words. 1 in ~4.6.

I hope you can now see a bit of a problem. Why is there so much ubiquity of these filler words? Anyone really intent on obfuscating the text would have added more variability. Reusing these words so often couldn't have taken the authors much 'mental energy'.

But also there is a further problem. Something more serious for your hypothesis. If the top words are all filler words then once you have eliminated them the frequencies of the remaining words start to flatten. Among them there isn't any dominant common word, none that occurs significantly more than the next common word. In short, the non-filler words will not follow Zipf's law.

So then, since Zipf's law holds for most natural languages, and it won't under your hypothesis, where is your meaningful text?

This isn't accurate if the "filler" words you've identified, are not fillers but they have multiple meanings like : shedy for example. My translation has bréidín, with 5 regularly used meanings. The math doesn't work also when you consider that filler words may be contained in other words, like in my translation al  suffix is "from"(an actual filler word).
(17-11-2025, 01:25 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Why is there so much ubiquity of these filler words? Anyone really intent on obfuscating the text would have added more variability. Reusing these words so often couldn't have taken the authors much 'mental energy'.
You are making an assumption that "Anyone really intent on obfuscating the text would have added more variability". I don't think you can make that assumption. Clearly, if the Voynich is written in such a way the author's approach has been successful in obfuscating the text without taking much mental energy.
(17-11-2025, 01:25 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.But also there is a further problem. Something more serious for your hypothesis. If the top words are all filler words then once you have eliminated them the frequencies of the remaining words start to flatten. Among them there isn't any dominant common word, none that occurs significantly more than the next common word. In short, the non-filler words will not follow Zipf's law.

So then, since Zipf's law holds for most natural languages, and it won't under your hypothesis, where is your meaningful text?

I didn't say that "all the top words are filler words" that is your assumption. Clearly, amongst the real words there will be more common words like pronouns or conjunctions, so it wouldn't be as simple as saying the 80% most frequent words are filler words.

I would suspect that the longer of your "top words" are more likely to be filler words than the shorter of the most frequent words as the most common words in a language tend to be shorter words, so the most frequent real words would be short words. Of course, this does not mean that all short words are real words.
Quote:In short, the non-filler words will not follow Zipf's law.

If most words are fillers then the remaining texts would be really short.
For example a whole plant page could contain a text:

Yarrow is good for heart and wounds. Drink hot with wine

or even

yarrow good heart wounds drink hot wine

Such short texts may not obey statistical laws working for longer texts
It is the case that if 80% of the text is filler text and only 20% real text then the overall statistical and other properties of the real text may be quite different from those of the text as a whole.
(17-11-2025, 01:25 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.In short, the non-filler words will not follow Zipf's law.

But the whole text does follow Zipf's law...
(17-11-2025, 12:10 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In the case of Voynichese, the entropy per character depends on whether one counts each EVA character as a separate letter, or if one considers each of Ch, Sh, ee, iin, etc to be a single letter.

And one must look at higher-order entropy, at least order 3 or 4 for characters. 

High orders need huge samples to converge but comparisons can be done on samples of the same size. I can use big text samples for order 3: I'm not sure if this has been done recently. Probably last time was ~30 years ago and the only available program (running on DOS) had a severe size limitation IIRC.

To measure how wasteful Voynichese is, EVA is not ideal, the situation is actually worse: Froggy may be the best rendition in terms of space, since gallows use about as much horizontal space as ee.

ch and Sh are usually larger than ee, so using 2 characters for them is not perfectly representative of their footprint.
I suggest a slightly different type of steganographic model. Instead of vords hidden on a page, segments of text have been placed in circular bands specifically marked with a set of patterned markers - in the VMs cosmos, in the zodiac, and in the rosettes and other places. Only three segments in the whole of the VMs zodiac sequence. It's not like they're everywhere. There is some reason to consider that they were intentionally constructed. Two on White Aries and one in Cancer. White Aries is the key. It is clear that a functional mechanism for text segment designation sits in place. The question is, was it used?
Pages: 1 2 3