The Voynich Ninja - The Naibbe cipher

Pages: 1 2 3 4 5 6 7 8

Hi everyone, thank you to everyone for your questions on the Naibbe cipher. A huge thank-you to the community and to Koen for letting me present at this year's Voynich Manuscript Day.

You'll find a full preprint version of my paper, as well as 20 reference Naibbe ciphertexts, in the following Dropbox folder: You are not allowed to view links. Register or Login to view.

In addition, you'll find more resources—including Microsoft Excel implementations of Voynichesque and the Naibbe cipher—at the following Zenodo data repository: You are not allowed to view links. Register or Login to view.

For reference, the preprint contains everything except my replication of Bowern and Gaskell (2022) and Gaskell and Bowern (2022), which I just got working a few days ago.

The first thing that interested me was whether the binomial distribution could be represented using your cipher. In short, I have never seen such a good match. Congratulations.

[attachment=11127]

(03-08-2025, 04:46 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.The first thing that interested me was whether the binomial distribution could be represented using your cipher. In short, I have never seen such a good match. Congratulations.

Thank you! The whole journey of deriving the cipher began with a desire to reliably replicate the VMS's observed token and type length distributions while also obeying the text's word grammar and entropy. The structure of the cipher emerged from there.

(03-08-2025, 06:45 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.The whole journey of deriving the cipher began with a desire to reliably replicate the VMS's observed token and type length distributions while also obeying the text's word grammar and entropy. The structure of the cipher emerged from there.

Can this approach explain line-as-a-functional-unit properties, such as the tendency of certain characters and combinations to appear near/at the beginning or end of lines?

(03-08-2025, 10:24 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Can this approach explain line-as-a-functional-unit properties, such as the tendency of certain characters and combinations to appear near/at the beginning or end of lines?

In addition to the question, here is the link to Elmar Vogt's paper: You are not allowed to view links. Register or Login to view.

(03-08-2025, 10:38 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.In addition to the question, here is the link to Elmar Vogt's paper: You are not allowed to view links. Register or Login to view.

I vaguely remember that in the Voynich Manuscript there is the tendency of clustering for short and long words, as opposed to natural languages. So, a short word is more likely to follow another short word in the Voynich MS and long words are more likely to be followed by more long words. Was there a paper that described this behavior?

@magnesium
Is this behavior compatible with the Naibbe cipher?

(03-08-2025, 10:24 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(03-08-2025, 06:45 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.The whole journey of deriving the cipher began with a desire to reliably replicate the VMS's observed token and type length distributions while also obeying the text's word grammar and entropy. The structure of the cipher emerged from there.

Can this approach explain line-as-a-functional-unit properties, such as the tendency of certain characters and combinations to appear near/at the beginning or end of lines?

The short answer: Big-picture, the cipher can theoretically accommodate the VMS's line, paragraph, and page properties, but it currently lacks mechanisms that reliably produce them. This is a known limitation of the current version of the Naibbe cipher (see Section 4.3 of the paper), and I see it as an important area for future investigation.

The long answer: the Naibbe cipher can generate ~5/6 of the tokens in Voynich B, so it can generate stretches of text that exactly replicate what's seen in the VMS. But the Naibbe cipher is, first and foremost, meant to replicate the word-level properties of the VMS. Within the Naibbe cipher, a given ciphertext token stands for 1 or 2 plaintext letters, achieved by re-spacing a Latin or Italian plaintext roughly 50-50 into unigrams and bigrams and then encrypting those n-grams by selecting substitution options from 6 different tables on a letter-by-letter basis.

As a result, the structure of a given line of Naibbe ciphertext depends on three things simultaneously:

1. The exact content of the plaintext
2. How exactly the plaintext is re-spaced into unigrams and bigrams
3. The exact sequence of tables used to encrypt the text on a letter-by-letter basis

As to some of the specific line-as-a-functional-entity features of the VMS, I think some judiciously placed nulls could go a long way. The Naibbe cipher as I described it in the presentation and paper does not use any nulls, but we could certainly extend the cipher to include some. For example, we could arbitrarily designate that a gallows glyph or prefix (e.g., pch) beginning a "paragraph" is a null that's just meant to set off an apparent paragraph. Similarly, we could designate that line-ending tokens that end with -m are treated as nulls that simply serve to pad out the line length.

There are also other modifications that could be made. In the current version of the Naibbe cipher, plaintext re-spacing occurs completely randomly (i.e., not a systematic spacing system like putting a space before/after every vowel), as does table selection. But in principle, neither of these have to be fully random, which could potentially accommodate some VMS properties, such as the word length correlation. In addition, one could imagine that the first line of a paragraph is encrypted slightly differently than the rest of a paragraph, such as by happening to favor a table with a higher incidence of p than other tables have. I should also re-emphasize that the structure of the Naibbe cipher is based on average stats across all of Voynich B, which definitely smooths over some section-by-section differences and mushes together what seem to be the distinct preferences of Scribes 2 and 3.

The Naibbe cipher isn't perfect, but it's a place to start. I'd love to collaborate with folks and further investigate whether and how the Naibbe cipher can be extended/modified to accommodate the VMS's line-level properties. Part of this work, I suspect, will involve screening for plaintext properties that make those line-level statistics more or less likely.

I should add here that line-by-line reuse, similar to Timm and Schinner's self-citation algorithm, will be an important piece of the puzzle. I wasn't able to discuss it in my presentation, but the Stars section of Voynich B exhibits markedly more token autocorrelation than Naibbe ciphertexts do at low amounts of sequence offset (1-300 tokens), while at large amounts of sequence offset the Naibbe cipher and the VMS match. I see this as a pretty clear signpost that there's some amount of line-by-line reuse and more word repetition than the current version of the Naibbe cipher can randomly generate.

(04-08-2025, 02:24 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.The Naibbe cipher isn't perfect, but it's a place to start. I'd love to collaborate with folks and further investigate whether and how the Naibbe cipher can be extended/modified to accommodate the VMS's line-level properties. Part of this work, I suspect, will involve screening for plaintext properties that make those line-level statistics more or less likely.

Thank you for sharing your work! The Naibbe cipher is a bit at odds with what I would consider a good candidate for Voynichese (for the labels to make sense, I would expect the verbosity not exceeding something like ~1.5-2.5 glyphs per plaintext character on average), but overall I think this is the most thought through attempt of replicating the statistics of Voynichese I've seen so far.

(03-08-2025, 10:44 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I vaguely remember that in the Voynich Manuscript there is the tendency of clustering for short and long words, as opposed to natural languages. So, a short word is more likely to follow another short word in the Voynich MS and long words are more likely to be followed by more long words. Was there a paper that described this behavior?

It's mentioned in "Gibberish after all?" Gaskell and Bowern, Malta conference.

Quote:A notable feature of the VMS that has to our knowledge only been discussed by one other publication
[20] is positive autocorrelation of word lengths. Word lengths in most meaningful texts are negatively
autocorrelated: that is, long words tend to be interspersed with short words (long-short-long-short). By
contrast, the VMS exhibits positive autocorrelation (long-long-short-short). Positive autocorrelation is
only observed in a limited number of natural languages, but is common in gibberish (Figure 3).

...

[20] V. Matlach, B. A. Janečková, and D. Dostál, “The Voynich manuscript: Symbol roles revisited,” PLOS ONE, vol. 17, no. 1, p. e0260948, Jan. 2022, doi: 10.1371/journal.pone.0260948.

Pages: 1 2 3 4 5 6 7 8