The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8

Quote:According to Reddy and Knight (2011), the VMS is ~38,000 tokens long and contains ~8100 word types.

I thought myself a little bit about unique words count and ratio in VM and my observation is that in this aspect VM is actually very similar to natural languages. The diagram above shows it too.

It is also more similar to languages with high inflection (like Latin) than to languages with low inflection (like English).

I made a few tests myself and may show them at some moment. Generally the same text in Latin and English will have more unique words in Latin. You may argue that they aren't "true" unique words because many of them will be grammatical forms of the same word (like flos, floris, flore, florem...) while in English it will be always "flower". Yet most computer algorithms will treat these grammatical forms as distinct unique words.

So the ratio of unique words in VM is more similar to Latin than to English, yet we must remember that a lot depends on the topic of the text.

An excellent and thought-provoking approach! I am pleased to see that my own work on the manuscript’s text appears to have contributed, in part, to the inspiration behind it – my sincere thanks for the kind acknowledgement. Inlove

Although I have so far only been able to engage with the paper on a preliminary level, it strikes me as a promising contribution to the field.
At first glance, the chronological placement of the proposed method seems more plausible than that of my own PIII hypothesis. At the same time, however, the encryption mechanism appears significantly more complex.
I would expect that any attempt at decipherment using the complex codebooks will be extremely demanding. I nevertheless hope to explore the idea further during the upcoming days.

(03-08-2025, 04:18 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.Hi everyone, thank you to everyone for your questions on the Naibbe cipher. A huge thank-you to the community and to Koen for letting me present at this year's Voynich Manuscript Day.

You'll find a full preprint version of my paper, as well as 20 reference Naibbe ciphertexts, in the following Dropbox folder: You are not allowed to view links. Register or Login to view.

In addition, you'll find more resources—including Microsoft Excel implementations of Voynichesque and the Naibbe cipher—at the following Zenodo data repository: You are not allowed to view links. Register or Login to view.

For reference, the preprint contains everything except my replication of Bowern and Gaskell (2022) and Gaskell and Bowern (2022), which I just got working a few days ago.

(07-08-2025, 09:45 PM)hermesj Wrote: You are not allowed to view links. Register or Login to view.An excellent and thought-provoking approach! I am pleased to see that my own work on the manuscript’s text appears to have contributed, in part, to the inspiration behind it – my sincere thanks for the kind acknowledgement.

Thank you for your work! Your 2022 conference paper was a major source of inspiration.

Quote:Although I have so far only been able to engage with the paper on a preliminary level, it strikes me as a promising contribution to the field.
At first glance, the chronological placement of the proposed method seems more plausible than that of my own PIII hypothesis. At the same time, however, the encryption mechanism appears significantly more complex.

Part of the motivation for the cipher's design was to try and find a more efficient way of generating more word types, while remaining broadly consistent with 15th-century homophonic substitution ciphers. Considering Voynich B, for example, we have ~23000 tokens to come up with >4500 unique word types. Generating and/or decrypting such a wide variety of word types using a PIII-style codebook requires a very large codebook. If there are separate ways of encrypting unigrams (as whole word types) and longer n-grams (using various lists of affixes), we preserve many of the appealing statistical properties of having individual tokens map to individual plaintext letters, while also dramatically shrinking the requisite size of the lookup tables.

Quote:I would expect that any attempt at decipherment using the complex codebooks will be extremely demanding. I nevertheless hope to explore the idea further during the upcoming days.

I encourage you, and everyone, to try your hand at decrypting Naibbe ciphertext using the preprint's Tables 10-12. The end of the main paper includes a one-line decryption exercise, and Figure 4 consists of the first several hundred letters of Julius Caesar's De bello gallico. As a reminder, decryption proceeds as follows:

Check a given token against the unigram word types (Table 10). If there's a match, decrypt the token as a unigram. This will go much more smoothly than you might think because 70-80 unigram word types account for >40% of all tokens, so you'll see a lot of repeat appearances, and only a small number of possible prefixes are used to make unigram word types.
If the token doesn't match the unigram word types in Table 10, the token encrypts a bigram, where the token's prefix encrypts one letter and the token's suffix encrypts the other. Most prefixes are made using Table 8; most suffixes are made using Table 9. As you get a feel for the cipher, the prefix-suffix breakpoint becomes much easier to spot, in no small part because the prefixes and suffixes are meant to mimic intuitive Voynichese "prefixes" and "suffixes." For example, if a word type contains an e glyph, that e and everything to the right of it is a suffix. Similarly, if you spot a bench-and-gallows glyph (e.g., cth), that glyph and everything to the left of it is the prefix.
Once you have the bigram's prefix and suffix, you'll look them up in either Table 11 (prefixes) or Table 12 (suffixes).

Managed to get a prototype Python version up and running ... barely. Sleepy

... need sleepz now.

good morgning, a quick comment on the regularity of Voynich. all words that have a gallow sign integrated into the letter cth cph ckh cph also exist withouth the gallow sing in the middle.
all words with an e, ee, eee, eeee in the word (and always and only after the first letter of a word) exist withouth the e in the manuscript. I question wether these theory of prefixes and suffixes could be used therefore?

(10-08-2025, 06:15 AM)Petrasti Wrote: You are not allowed to view links. Register or Login to view.good morgning, a quick comment on the regularity of Voynich. all words that have a gallow sign integrated into the letter cth cph ckh cph also exist withouth the gallow sing in the middle

This is not strictly true. I just made some spot checks, but for instance (transcription RF1a-n) there are 3 ackhy and 6 acthy, but no achy. There are 3 chckh and 3 chcth, but no chch.

(10-08-2025, 06:15 AM)Petrasti Wrote: You are not allowed to view links. Register or Login to view.
all words with an e, ee, eee, eeee in the word (and always and only after the first letter of a word) exist withouth the e in the manuscript. I question wether these theory of prefixes and suffixes could be used therefore?

e is rather rare as 2nd letter in a word, so I'm not sure how much this observation is statistically supported. And there are exceptions too (spot check): there is one seeedy but no sdy, ther are 2 qear and 1 qeear but no qar

Hi Mauro, can you tell me on which page you check the words?
the word qear qeear is difficult to compare, since this word in each form only occurs once in the manuscript.
regards Petra

(10-08-2025, 02:37 PM)Petrasti Wrote: You are not allowed to view links. Register or Login to view.Hi Mauro, can you tell me on which page you check the words?
the word qear qeear is difficult to compare, since this word in each form only occurs once in the manuscript.
regards Petra

I checked on the dictionary extracted from RF1a-n, you can look up the page references with Ctrl-F on the original RF1a-n file with metadata. qear is on You are not allowed to view links. Register or Login to view. and f111v, qeear is on f111r:

<f24r.10,+P0> qear.cfhar.chor.s.am.chotaiin.dy
<f111v.12,+P0> qear.ain.shey.okeeey.qokaiin.checkhy.sho.lchal.sheey.shckhey.kshartar
<f111r.47,+P0> sheedy.qokeey.sheey.qoteedy.qeear.al.chedy.oteey.chedy

(sorry but I never remember where I put the link to RF1a-n, you should find the transcription on Renè Zandbergen's website).

Hi everyone, I need to change my information a bit. I was too caught up in my theory, where I look for the base words, and the e always comes second. Generally speaking, the "e" can also be placed further back. Still, look at the frequency of words with and without the "e". There are simply too many "similar" words for it to be a coincidence.
The same applies to words with the gallow sign in the middle c*h. These appear more often than average in the manuscript even without the gallow sing.
The same thing happens with words that have a tilde above the ch. c+h to ch. These also occur disproportionately often without the tilde.

Of course, you'll find exceptions, especially for words that occur rarely and are rarely mentioned. But the frequency of the mentioned similarities is still remarkably high.

Hi Maoro,
I use You are not allowed to view links. Register or Login to view. on page You are not allowed to view links. Register or Login to view. the word is in my opinion "qeor"
on page 111 you are right. (but the word only exist 2x in this form in the manuscript)
both other words you mentioned doesn´t exist on voynichese.com
chch for sure doesn´t exist. Can you please check, too. I didn´t found in the manuscript two same latters single written, like aa, kk, ll
I would also like to use the site you mentioned to check the possibilities
I'd like to briefly understand how exactly you activate the search function on the site. Unfortunately, my PC doesn't search in the manuscript when I press Control F. Is there anything I should be aware of?

Pages: 1 2 3 4 5 6 7 8

Rafal

hermesj

magnesium

RobGea

Petrasti

Mauro

Petrasti

Mauro

Petrasti

Petrasti