The Voynich Ninja

Full Version: Linguistic Patterns Before Decipherment: A Key to Understanding Unknown Texts
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
One of the significant challenges in working with unknown or undeciphered manuscripts, such as the VMS, is the temptation to immediately “decode” them by matching words, tokens, or glyphs to a known language through direct translation attempts.

However, I would like to reflect on the history of linguistic and philological studies. One lesson stands out clearly: understanding the internal structure of a language often precedes and enables successful translation, rather than being a consequence of it.

(This principle is not only historical but also literary)
I like the example of J.R.R. Tolkien: when he created his Elvish languages (Quenya, Sindarin, etc.), he built them from the inside out. Tolkien first designed phonological patterns, rhythms, internal grammars, and aesthetic structures before assigning stable translations to words. He understood that a credible language must have internal consistency and recognizable patterns, even if its dictionary is initially unknown. Without internal structure, a language feels artificial and cannot sustain meaning over time.

Similarly, even modern English, which seems globally dominant, will eventually fragment if left unchecked for a long time. Its internal rhythms, syntax, and patterns of use (formal, informal, poetic, administrative) will be essential clues for future scholars attempting to reconstruct it, should direct transmission of meaning be lost. Globalization has slowed geographic divergence, but internal linguistic evolution is unstoppable, just as Latin diversified into the Romance languages over centuries.

Why is this important for the VMS?

Because even without a direct decipherment, identifying structural patterns — whether in vocabulary frequency, thematic clustering, or phase-based rhythms — can offer us a way to understand how the text was composed. It can reveal whether it behaves like natural language, liturgical formula, mnemonic code, or something entirely different.

Just like Tolkien’s Elvish languages were immediately recognizable as coherent systems (even without knowing what every word meant), the Voynich Manuscript might reveal its internal organization independently of translation. Recognizing these structures is not just speculative: it follows the path historically used to decipher other ancient texts.

In short:
Suppose we can detect rhythm, structure, and phases in the Voynich Manuscript, even without complete translation. In that case, we are engaging with it in the most historically grounded and linguistically rigorous way possible.

What are your thoughts?
Yes, this looks like a good approach, but this also has been tried hundreds of times with the Voynich Manuscript. I'm extrapolating from the frequency of some recent attempts that I know of. As far as I know, for some reason not a single one of these attempts produced any specific universally recognized breakthrough into the underlying language.

The most popular explanations I've seen for this are (in no particular order): 
- nonsensical text: hoax, glossolalia;
- not a text: visual code, etc;
- agrammatical text: inventory, list of charms;
- specific kind of cipher: one that would hide repeating patterns.

What is your take?
(Yesterday, 07:06 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Yes, this looks like a good approach, but this also has been tried hundreds of times with the Voynich Manuscript. I'm extrapolating from the frequency of some recent attempts that I know of. As far as I know, for some reason not a single one of these attempts produced any specific universally recognized breakthrough into the underlying language.

The most popular explanations I've seen for this are (in no particular order): 
- nonsensical text: hoax, glossolalia;
- not a text: visual code, etc;
- agrammatical text: inventory, list of charms;
- specific kind of cipher: one that would hide repeating patterns.

What is your take?

I agree!  Many attempts have indeed failed to find a clear pattern.
I’m not trying to decipher the content, but I would like to see if the manuscript has a measurable internal structure.
Using techniques like entropy analysis, clustering, and permutation tests, I would like to check if the three visual blocks also differ statistically, indicating internal coherence rather than randomness.

If the text were truly a random chunk (glossolalia or noise), we shouldn’t see any consistent or repeating word-level patterns. While it’s tempting to believe that “everything eventually forms a pattern” with enough data, that’s not how true randomness behaves: random sequences may show local clusters, but they lack global structural consistency.
If we detect a stable, statistically significant pattern, especially one that aligns with visual or thematic blocks, then it was likely introduced intentionally, not by chance.

In this case, I believe that techniques like clustering and LDA should be able to separate the three thematic sections of the manuscript based on the word distributions associated with each part. Topic modeling (LDA), entropy measures, and supervised classifiers (like Random Forest) may be able to detect whether the vocabulary shifts in statistically meaningful ways across sections.
I agree with @oshfdk: the approach is obviously valuable and well known, but it's just a generic framework which needs to be fleshed out. This has been attempted many times in different ways with the VMS without, up to now, definite results. Let's hope for the future.
(Yesterday, 08:56 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I agree with @oshfdk: the approach is obviously valuable and well known, but it's just a generic framework which needs to be fleshed out. This has been attempted many times in different ways with the VMS without, up to now, definite results. Let's hope for the future.

You’re right! The framework itself has been explored many times. But the fact that it’s been attempted before shouldn’t discourage us. In my opinion, what matters is how it’s implemented.

I believe the approach adds value precisely because it can be replicable, statistically validated, and grounded in a measurable structure, not just interpretive guesswork.  If past attempts failed to yield precise results, it might be because they lacked rigorous controls or robust metrics. 

As far as I know, what I said hasn’t been done before: it’s not just another pattern search. If we identify three macroblocs, validate them statistically (using permutation tests, ablation, or synthetic controls), and show that they break down when the phases are shuffled, then it can work. The method combines entropy, LDA, supervised models, and supervised classification with a control comparison.

I think it can be interesting, and to my knowledge, no previous study has applied this level of multivariate, statistically controlled analysis to the VMS.
(Yesterday, 09:28 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.If past attempts failed to yield precise results, it might be because they lacked rigorous controls or robust metrics.

...

I think it can be interesting, and to my knowledge, no previous study has applied this level of multivariate, statistically controlled analysis to the VMS.

Just a friendly reminder: a lot of people from all kinds of backgrounds attempted this work before. For example, You are not allowed to view links. Register or Login to view. did a lot of research on the structure of the Voynich Manuscript a couple of decades ago, and quoting from Wikipedia:

Quote:Jorge Stolfi is a full professor of computer science at the State University of Campinas, working in computer vision, image processing, splines and other function approximation methods, graph theory, computational geometry and several other fields. According to the ISI Web Of Science, as of 2010 he was the most highly cited computer scientist in Brazil.

This, of course, doesn't mean that we should give up and avoid further study, but I think underestimating the scope and depth of work already done on the manuscript would be counterproductive. I suppose, the number of highly professional world class statistical studies performed on the manuscript is not even in single digits.
(Yesterday, 09:50 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(Yesterday, 09:28 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.If past attempts failed to yield precise results, it might be because they lacked rigorous controls or robust metrics.

...

I think it can be interesting, and to my knowledge, no previous study has applied this level of multivariate, statistically controlled analysis to the VMS.

Just a friendly reminder: a lot of people from all kinds of backgrounds attempted this work before. For example, You are not allowed to view links. Register or Login to view. did a lot of research on the structure of the Voynich Manuscript a couple of decades ago, and quoting from Wikipedia:

Quote:Jorge Stolfi is a full professor of computer science at the State University of Campinas, working in computer vision, image processing, splines and other function approximation methods, graph theory, computational geometry and several other fields. According to the ISI Web Of Science, as of 2010 he was the most highly cited computer scientist in Brazil.

This, of course, doesn't mean that we should give up and avoid further study, but I think underestimating the scope and depth of work already done on the manuscript would be counterproductive. I suppose, the number of highly professional world class statistical studies performed on the manuscript is not even in single digits.

I fully respect the work of Jorge Stolfi and other early researchers. Who said I underestimated it jajaj

What I explained is fundamentally different: rather than exploring local repetition patterns or proposing generation algorithms, I focus on identifying and statistically validating large-scale internal structures. Different.
Through permutation tests, ablation studies, synthetic controls, and supervised classification, I aim to rigorously test whether the Voynich Manuscript exhibits non-random, cyclical organization. This specific combination of methods has not been applied before. Why not now? Seems interesting to me!
Sounds fine. Are you looking for some specific feedback?
(Yesterday, 10:34 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Sounds fine. Are you looking for some specific feedback?

I think he already asked for feedback in the You are not allowed to view links. Register or Login to view. thread. I took this thread here to be a generic one about ways of translating (or deciphering?) an unknown text, not as a second one about his specific method.
(Yesterday, 10:46 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.
(Yesterday, 10:34 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Sounds fine. Are you looking for some specific feedback?

I think he already asked for feedback in the You are not allowed to view links. Register or Login to view. thread. I took this thread here to be a generic one about ways of translating (or deciphering?) an unknown text, not as a second one about his specific method.

Actually, I opened a new thread because, although the aim is the same, the approaches are different!
Pages: 1 2 3