[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text

[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text (/thread-5718.html)

Pages: 1 2 3 4 5 6 7

[split] What can the structural peculiarities of the VMS tell us about the nature of the underlying text - JoJo_Jost - 06-05-2026

I mean, based on the research conducted so far, we can narrow down the possibilities to four points.

1. A rare ancient language phonetical or real
Stolfi’s theory that it is an Asian language. Or it could be a very rare dialect that no one speaks or know anymore (I’m not allowed to say here which one that might be Big Grin

).

2. It is a made-up language, see Hildegard von Bingen.

3. It is a complex, section-dependent generator that would be completely anachronistic for its time (whether it’s a hoax or not doesn’t even matter here).

4. It is a nomenclature system with lists switched by f.e. line-start markers, and the lists no longer exist.

Except for Stolfi’s approach and possibly easier Genereators of option 3, none of these possibilities can be solved even with AI or a quantum computer. That is because this is not a problem of computing power or artificial intelligence.

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 06-05-2026

(06-05-2026, 06:19 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.1. A rare ancient language phonetical or real Stolfi’s theory that it is an Asian language. Or it could be a very rare dialect that no one speaks or know anymore (I’m not allowed to say here which one that might be ).

Talking to another VMS fan here, it occurred to us that it could also be a very literal character-by-character "translation" of Chinese texts into some non-Asian language, like Arabic or Hebrew. He believes that the word structure and character frequencies are compatible with that.

Or into that other language that you did not mention here. Big Grin

All the best, --stolfi

RE: How should we deal with LLMs on the forum? - Jorge_Stolfi - 06-05-2026

(06-05-2026, 06:19 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.4. It is a nomenclature system with lists switched by f.e. line-start markers, and the lists no longer exist.

That "code switching by line" is the LAAFU hypothesis. I don't think there is real evidence for it. There are all sorts of statistical anomalies around line breaks, sure; but I still believe that they can be explained as side effects of the Scribe's line-breaking "algorithm", that do not affect the "encoding" and do not depend on the meaning of the text.

As for the theory that the "code" is a codebook-based cipher, the structure of the words indeed suggests that they are numbers in a Roman-like notation. (IIUC, that is the basis of the "Naibbe cipher" proposal, correct?) But the main argument against it is that, until the 1800s, such a cipher would be quite laborious to write and read. AFAIK codebook ciphers were used only for short documents with very sensitive contents. Is there any example of a VMS-size book that is written entirely in such a cipher?

All the best, --stolfi

RE: How should we deal with LLMs on the forum? - JoJo_Jost - 06-05-2026

(06-05-2026, 06:48 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view. Is there any example of a VMS-size book that is written entirely in such a cipher?

All the best, --stolfi

No, not to my knowledge. But based on my research so far (!!!), there is also no indication of a generator that could even begin to produce such a complex system (especially given the very clear differences between the sections).

The underlying structure of the texts is also very difficult to reconcile with a generator—in fact, it is precisely this structure that points to an actual language that just happens to have this structure by chance.

The differences between the sections would be easiest to explain with different lists.

If it were a hoax, why would anyone go to the trouble of arranging it differently in every section? That is completely unnecessary for a hoax.

The LAAFU features are stronger than previously described—the initial letters produce significantly longer and significantly shorter lines for p vs. o. They favor certain subsequent letters/bigrams and tend to reject others. So something is going on here, and it’s pretty crazy, even if it’s only weakly pronounced in some cases. Let’s assume that only a few things are being triggered: bigrams have a different relationship under certain line markers, or only the Gallows have a different meaning—it doesn’t have to be much. But how else can one explain that the lengths AND the glyphs/bigrams used in the lines vary depending on the marker?

RE: How should we deal with LLMs on the forum? - oshfdk - 06-05-2026

(06-05-2026, 06:19 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.I mean, based on the research conducted so far, we can narrow down the possibilities to four points.

1. A rare ancient language phonetical or real
Stolfi’s theory that it is an Asian language. Or it could be a very rare dialect that no one speaks or know anymore (I’m not allowed to say here which one that might be ).

2. It is a made-up language, see Hildegard von Bingen.

3. It is a complex, section-dependent generator that would be completely anachronistic for its time (whether it’s a hoax or not doesn’t even matter here).

4. It is a nomenclature system with lists switched by f.e. line-start markers, and the lists no longer exist.

Except for Stolfi’s approach and possibly easier Genereators of option 3, none of these possibilities can be solved even with AI or a quantum computer. That is because this is not a problem of computing power or artificial intelligence.

For me these 4 points look much less likely than a combination of a specific plaintext and a relatively simple character level cipher (homophonic and not a nomenclature), that can still produce a result like this. All character level statistics can be explained by using a cipher, all larger level statistics can be explained by the structure of the plaintext, in principle. On the other hand, I think all 4 options above are highly unlikely. 1 and 2 would be solvable by mapping labeled images and text. 4 seems highly impractical. And I don't even understand what 3 is in practice. An elaborate game of cards and dice? A divination tool?

RE: How should we deal with LLMs on the forum? - JoJo_Jost - 06-05-2026

(06-05-2026, 11:00 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.All character level statistics can be explained by using a cipher, all larger level statistics can be explained by the structure of the plaintext, in principle.

Yes, it's all a matter of belief. But I'm sure that if it were a well-known medieval language - even one with a more complex cipher and a very specific plaintext - it would have been deciphered by now.... But fine, prove me wrong; I'd be thrilled if someone finally managed to decipher the VMS. Wink

RE: How should we deal with LLMs on the forum? - oshfdk - 06-05-2026

(06-05-2026, 01:25 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.Yes, it's all a matter of belief. But I'm sure that if it were a well-known medieval language - even one with a more complex cipher and a very specific plaintext - it would have been deciphered by now.... But fine, prove me wrong; I'd be thrilled if someone finally managed to decipher the VMS.

I was under the impression that you are working exactly on this - attempting to decode the manuscript as a specific type of plaintext written in a somewhat encoded version of Bavarian?

Note that Trithemius’s Steganographia Book III was only decoded in ~1998 by You are not allowed to view links. Register or Login to view., despite being published and available for 500 years and written in Latin. You are not allowed to view links. Register or Login to view.

RE: How should we deal with LLMs on the forum? - JoJo_Jost - 06-05-2026

@ You are not allowed to view links. Register or Login to view. The problem is, if it were a “normal” language from the 15th century and a cipher that corresponded to the ciphers of that time - based on what we know today - it would have been deciphered by now...

Edit: Oh, and the AI doesn't have a database for many of the rural dialects of that time, and possibly even some forgotten languages.

RE: How should we deal with LLMs on the forum? - oshfdk - 06-05-2026

(06-05-2026, 05:15 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.@ You are not allowed to view links. Register or Login to view. The problem is, if it were a “normal” language from the 15th century and a cipher that corresponded to the ciphers of that time - based on what we know today - it would have been deciphered by now...

I just gave an example of a cipher in Latin from the 15th century (technically, ~ from 1499) that was known for 500 years, but only deciphered in 1998. I see absolutely no reason to believe that any 15th century cipher would be easy to crack.

(06-05-2026, 05:15 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.Edit: Oh, and the AI doesn't have a database for many of the rural dialects of that time, and possibly even some forgotten languages.

I think AI might not need it that much. While modern LLM-based AI suffers in logical thinking and planning, it's good in interpolation/extrapolation, so identifying a dialect or a forgotten language by cognates may be relatively easy for an AI with its access to hundreds of languages. AI is tireless, it is quite possible to task AI with something like analyzing a list of 100000 word candidates matching each for possible languages, performing Google Books lookups, etc. This is costly and doesn't look like much fun to me, but I think there are a lot of opportunities for AI-assisted research into the Voynich MS.

RE: How should we deal with LLMs on the forum? - RadioFM - 06-05-2026

Bespoke ciphers too convoluted for practical use back in the day would not have seen widespread adoption, and died out in obscurity