The Voynich Ninja

Pages: 1 2 3

After quite a long time (months) trying to adapt my knowledge in data science, and the new things I have learned along the way, to the analysis of the Voynich text, I am reaching a rather frustrating point where I ask myself whether we might actually be at the end of the road regarding the analysis of the manuscript’s text.

What I have been finding lately in the studies I have carried out and posted here is that the results at the level of text structure, entropy, etc., all end up reaching and reaffirming conclusions that “human” experts had already arrived at years or even decades ago. It does not seem that we can squeeze much more information out of the Voynich text, assuming it is a text at all.

I am convinced that if no human has been able to solve the Voynich, we will not be able to do it with the help of artificial intelligence either, since in the end it only repeats what humans have already done before.

Do you think that, from the perspective of text analysis, it is still possible to go further?

(12-03-2026, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.I am convinced that if no human has been able to solve the Voynich, we will not be able to do it with the help of artificial intelligence either, since in the end it only repeats what humans have already done before.

Do you think that, from the perspective of text analysis, it is still possible to go further?

I think the ground you are standing on is extremely well trodden. Some of the greatest cryptologists have not only tried and tried, but have eventually decided it is a dead end, and to just move on.

I'm a little bit more optimistic though.

I think the current generation is burned out but I am still optimistic that the next generation will be able to make progress. It may be young people who are only now in high school or just starting college. They may be thinking of careers in Computational Linguistics or Historical Linguistics and Language Reconstruction.

There is a lot of interest all around the world in preserving fading languages and resurrecting lost languages. It's not just in the Americas with indigenous peoples. There is also interest in Europe.

Solving it all comes down to: What is the thing? Is it an encryption? Is it a lost language?

(12-03-2026, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.still possible to go further?

Somehow I doubt it. But then again I believe I already know the meaning of the manuscript - it has no meaning. I am firmly in the hoax / meaningless text / artificial fabrication camp and I have tried my very best on this forum to give out hints that people might be wasting their time searching for a meaningful narrative / natural language / shorthand / cypher solution.

It is a very hard problem—one that may have an answer of meaninglessness.

I'm personally of the opinion that we need to more regularly tackle the problem from the other direction and try to build generative models that can both preserve meaning and replicate the properties of Voynichese. If these models consistently fail, in consistent ways, then that tells us something important.

The Naibbe cipher can be seen as one such generative model. Its failures are instructive, particularly when it comes to the VMS's long-range correlations and line-position effects. If the VMS preserves meaning, the underlying text generation method needs to explain why these features of the manuscript naturally arise.

(12-03-2026, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.It does not seem that we can squeeze much more information out of the Voynich text, assuming it is a text at all.

I am convinced that if no human has been able to solve the Voynich, we will not be able to do it with the help of artificial intelligence either, since in the end it only repeats what humans have already done before.

Do you think that, from the perspective of text analysis, it is still possible to go further?

After years of looking at Voynich text and probabilities, I also have no idea whether the text is meaningful or not.

But if it DOES have meaning, my guess is that it comes from a cipher system which will be very difficult to disentangle, because multiple bigrams or trigrams may have the same values. For example:

Maybe "che" means the same thing as "ch" - but not "chee"
Maybe "dar" means the same thing as "dain" - but not "ar"

Or maybe the text is just meaningless. I don't know.

The manuscript has been very resilient against analysis. If it weren't, we wouldn't be talking about it. Although consumer-level LLM's churn out slop, that doesn't mean dedicated systems couldn't improve and result in surprisingly creative analyses in the future.

I personally have hope that we could learn more in the future because of this: Often progress comes suddenly when obscure texts are found tucked away in old libraries or other places. For all we know, there could be more Voynichese manuscripts out there somewhere. Perhaps more time needs to be spent exploring the libraries most likely to contain more examples.

(12-03-2026, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.I am reaching a rather frustrating point where I ask myself whether we might actually be at the end of the road regarding the analysis of the manuscript’s text.

I understand the frustration. I have been there too.

But yes, I think the riddle will be solved, and I suspect that it will be rather suddenly and unexpectedly (like solutions of crypto and language puzzles often are).

But I can tell you a few ways reduce your chances of making any progress:

Focus on character and digraph statistics, instead of words.
Assume that certain properties of European languages hold for all natural languages.
Assume that statistics are properties of a language, and not of a text.
Assume that there are practically no errors.
Assume that word spaces in the transcription are mostly correct.
Try to analyze all sections and all text types (labels, parags, etc.) at the same time.
Assume that the text of a parag is homogeneous, like a parag of a novel.
Focus on the variation of statistics along a line (as opposed to parag).
Focus on the statistics and possible equivalences of the puff gallows.
Keep all EVA characters distinct instead of mapping them to similarity classes and '?'.
Assume that the script is a sophisticated cipher.

And I can give you a tip: look for the use of daiin (word or suffix) as a function of position within a parag (not line) of the Herbal sections. But beware that it may show up as dair or laiin or possibly other "ink-similar" words. (I haven't done that myself, and I don't know whether it will lead to something. Apologies in advance if it turns out to be a waste of time.)

All the best, --stolfi

Quote:Do you think that, from the perspective of text analysis, it is still possible to go further?

I liked your threads a lot, all this data clustering and looking for patterns.

It showed to me however, that a lot depends on the used language and the text content.
The same Book of Genesis in different languages gave different patterns, right?

You also compared the texts if they have "schematic" or "loose" structure with some assumption that real text cannot be neither too schematic or too loose.
But it turned out that there was always some real text that gave "unreal" statistical properties.

As for now I see all this "Voynich is not random" results as somehow limited. They don't really tell us if the text is meaningful or not.

My opinion is that we don't have enough long gibberish texts for comparisons.
Someone should really make such study - hire 50 or so first year students, close them in a basement so they couldn't cheat and use computer for word generating and make them write 40000 words long texts of gibberish Smile

Then we could compare them to real texts, to Voynich and see how they differ against real text, Voynich and each other.

(12-03-2026, 04:49 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.After quite a long time (months) trying to adapt my knowledge in data science, and the new things I have learned along the way, to the analysis of the Voynich text, I am reaching a rather frustrating point where I ask myself whether we might actually be at the end of the road regarding the analysis of the manuscript’s text.
[..]
Do you think that, from the perspective of text analysis, it is still possible to go further?

Your postings referred to counts, graphs and, generally, automation. Later with some extra care for single pages, labels or even characters.

I don‘t think languages work that way, surely the VMS language does not.
AI is a data-surfing jerk, to my great surprise being able to hallucinate „translations“ in any direction somebody wants it to have (AI is not that intelligent at all, I am afraid) — not a major help and option.

But as there was never a systematic investigation even upon all known languages, apart from unknown or disappeared ones, there will be a light somewhere, somehow.

Just close the book, open it on the first page again, and consider clearly what you see. May open a new access to it.

Pages: 1 2 3

quimqu

asteckley

pjburkshire

dashstofsk

magnesium

ThomasCoon

Fontanellean

Jorge_Stolfi

Rafal

Stefan Wirtz_2