The Voynich Ninja

Full Version: Testable signatures on VMS structure
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(23-04-2026, 03:57 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view."the frequency of 'e' in english" is effectively saying "the frequency of 'e' in english texts on average". Just because a certain text can be deliberately changed from the norm, does not mean the norm doesn't exist. There is still a distribution of frequencies from all existing texts, and those meaningfully differ from language to language. 

I think I agree with Stolfi here, while colloquially we can talk about the frequencies of letters in a language, this is not precise and can be very misleading. I don't think there is such thing as "the frequency of 'e' in English", as a specific number. I don't think if we took three different non overlapping volumes of texts in English and started computing the frequency of 'e' in these three volumes the numbers would converge to a single specific value. They likely would end up in a very close range, but there can be substantial differences depending on the nature of the texts. There is simply no such thing as "average English".
(23-04-2026, 03:57 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Of course there are statistics for a language, and those statistics can be useful for comparing languages.

That is not true, and can be a disastrous assumption to make when comparing languages.

What one sees tabulated as "the frequencies of letters in English" or "the most common words in English" are actually statistics for specific types of English texts --  mainly novels and newspaper articles.   But the numbers for other texts may deviate considerably from those.  

Because, again, the frequency of a character or digraph in a text is determined by the frequency of the words that have that character or digraph; and the frequencies of words depend on the topic and style of the text.  In a succinct herbal, the word "herb" may occur much more often than "the", and the digraph "rb" may be more frequent than "th".

Quote:"the frequency of 'e' in English" is effectively saying "the frequency of 'e' in English texts on average" ... There is still a distribution of frequencies from all existing texts

It is not clear at all what "average" and "all existing texts" mean there.  Should one include all emails, stock reports, advertisements, tweets, online catalogs, cash register slips?  All texts created in the last 5 years, or the last 500 years? Should one count a book printed in a million copies as one text, or a million?

But anyway, that question is irrelevant.  When comparing statistics of the VMS, one should use texts of other languages that hopefully are of the same nature. Which is totally not going to be like the "average" text, in any language.

The Herbal section should at least be compared to other Medieval herbals -- not to novels or theological treatises. 

Even among herbals, even over the same herbs, there may be huge differences in word frequencies, due to differences in styles.  Compare

   "Herba Comica:  This herb is good for mumps and ingrown toenails, if it is 
   taken as infusion for two weeks, twice daily.  If it is taken in excess, it 
   will cause the belly button to turn green."

    "Laughwort. Cures: mumps, ingrown toenails. Preparation: Tea. Dosage: 
   one cup, 2 times per day, for 14 days.  Effects of abuse: green belly button."

Note that the words "is" and "it" appear 3 times each in the first entry, but not once in the second one.

Quote:The frequency of the word written "t h e" is significantly higher in an average english text compared with that of an average french text.

Well, "thé" means "tea" in French.  So that word may actually be more common in the average French coffeehouse menu than "the" in the average English coffeehouse menu...   Tongue

All the best, --stolfi
In facr, regardless the meaning of a voynich grapheme (a letter or something more complex), the point is that a generator will have to match the 4 weird signatures: if they happen to also match those of a known llanguage like chinese or any cipher structure not probed in the paper, this would be an outstanding discovery

(23-04-2026, 03:29 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(23-04-2026, 02:30 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Concretely, we propose 4 criteria that any Voynich script "generator" (like Naibbe) should meet to really look like Voynich. These are necessary criteria, not sufficient ones.

But your criteria are based on features that distinguish Voynichese from four (4) languages - two Western Indo-European ones and two Semitic ones. 

Natural languages are MUCH more varied than that.  There are the East Asian monosyllabic languages, agglutinative languages, languages with vowel harmony like Turkish and Hungarian, languages with definite articles written as postfixes, languages with and without noun inflections for gender and number and noun-adjective agreements, languages with postpositions instead of prepositions...  Then there is sandhi, which may be realized in writing (like "a" changing to "an" in English). 

But a bigger problem is that character statistics depend on the spelling system much more than on the language.  For instance, tones in romanized Chinese may be encoded as diacritics on a vowel, or as a numeric suffix 1-4.  The second choice would radically change the statistics of suffixes...

And, finally, statistics are a property of a text, not of a language.  There is no such thing as "the frequency of 'e' in English" or 'the most common Engish word'.   Someone wrote a whole novel in English without using 'e' even once -- and readers don't notice unless they are told.  In a materia medica the most common word may well be "take" or "cures", and the word "the" may hardly be used... 

All the best, --stolfi
(23-04-2026, 09:28 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.In facr, regardless the meaning of a voynich grapheme (a letter or something more complex), the point is that a generator will have to match the 4 weird signatures...

This is assuming the patterns are the result of a generator and not just a feature of the source material. For example, while Naibbe fails some of the criteria, can you say that it's impossible for it to conform to all 4 given the right type of the plaintext?
My personal opinion is that it could be possible that a Naibbe variant matches the 4 signatures simultaneously, but I dont know how one could achieve that as of today.

(23-04-2026, 10:03 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(23-04-2026, 09:28 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.In facr, regardless the meaning of a voynich grapheme (a letter or something more complex), the point is that a generator will have to match the 4 weird signatures...

This is assuming the patterns are the result of a generator and not just a feature of the source material. For example, while Naibbe fails some of the criteria, can you say that it's impossible for it to conform to all 4 given the right type of the plaintext?
(23-04-2026, 10:33 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.My personal opinion is that it could be possible that a Naibbe variant matches the 4 signatures simultaneously, but I dont know how one could achieve that as of today.

Which of the criteria will be the hardest to fine-tune Naibbe for? Suppose Naibbe is optimized so that the transitions between Naibbe tokens match transitions between words in the Voynich MS, will this solve most of the problems?
(23-04-2026, 10:35 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(23-04-2026, 10:33 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.My personal opinion is that it could be possible that a Naibbe variant matches the 4 signatures simultaneously, but I dont know how one could achieve that as of today.

Which of the criteria will be the hardest to fine-tune Naibbe for? Suppose Naibbe is optimized so that the transitions between Naibbe tokens match transitions between words in the Voynich MS, will this solve most of the problems?

M. Greshko took care to make Naibbe of plausible algorithmic complexity for late Medieval scholars. Yet, it is at the upper level of reasonable complexity for that time, I believe. The most difficult challenge when "forcing" Naibbe into matching the signatures will be to prevent a complexity explosion.

We can always imaging "ad hoc" generators that will match the signatures, but we must always keep in mind that they must be reasonably simple and realistic.
(24-04-2026, 09:07 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.M. Greshko took care to make Naibbe of plausible algorithmic complexity for late Medieval scholars. Yet, it is at the upper level of reasonable complexity for that time, I believe. The most difficult challenge when "forcing" Naibbe into matching the signatures will be to prevent a complexity explosion.

We can always imaging "ad hoc" generators that will match the signatures, but we must always keep in mind that they must be reasonably simple and realistic.

Edit: As I see, my question was now well worded. Without changing any of the mechanics of Naibbe cipher, is it possible to change its token to character assignments to make it compatible with other criteria?

Why would changing token to character assignments increase complexity? I was asking if fine tuning the assignment can make Naibbe compatible with other criteria.
(24-04-2026, 01:50 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(24-04-2026, 09:07 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.M. Greshko took care to make Naibbe of plausible algorithmic complexity for late Medieval scholars. Yet, it is at the upper level of reasonable complexity for that time, I believe. The most difficult challenge when "forcing" Naibbe into matching the signatures will be to prevent a complexity explosion.

We can always imaging "ad hoc" generators that will match the signatures, but we must always keep in mind that they must be reasonably simple and realistic.

Edit: As I see, my question was now well worded. Without changing any of the mechanics of Naibbe cipher, is it possible to change its token to character assignments to make it compatible with other criteria?

Why would changing token to character assignments increase complexity? I was asking if fine tuning the assignment can make Naibbe compatible with other criteria.

I dont think so.  To give you an idea, Naibbe monograms and bigrams are drawn from tables at random, for example: it is a core feature of a slot grammar. But this is enough to break the expected signatures.
(24-04-2026, 03:25 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.I dont think so.  To give you an idea, Naibbe monograms and bigrams are drawn from tables at random, for example: it is a core feature of a slot grammar. But this is enough to break the expected signatures.

Ok, thanks! So, no combination of Naibbe mappings and a meaningful plaintext can realistically produce ciphertext that would conform to the four criteria?
Pages: 1 2