The Voynich Ninja

Full Version: How could you prove that VM is gibberish?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
I wonder about one thing...

I can roughly imagine how the theoretical solution of Voynich Manuscript would look like. So someone gives a reading that:

- makes sense and is grammatical
- is relevant to the source text
- is consistent and generated using some general rules
- anyone can get it using these rules (is replicable)
and so on

But if someone wanted to prove that VM is nonsensical, how would the definitive proof look like? Is it possible at all?

I suppose "It is nonsensical because we cannot read it" argument isn't enough  Wink

Recently several people are analysing VM on statistical levels. They find some patterns in the text. Some are similar to real texts and some aren't.

But my feeling is that these statistical methods won't give 100% answer it the text is meaningful or not. They can say if the text is grammatical/rule based but it is not the same.

Imagine someone writing down the list of Chinese villages. I checked and some sources say that there are 600 000 of them. So we would have a text of 600 000 words without any grammar. Yet it would be meaningful.

On the other hand imagine some abstract poem like You are not allowed to view links. Register or Login to view. but maybe even more "hardcore":
It would have grammar be still no sense:

Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.


So is it possible at all to say 100% sure if the text is meaningless and not in some language or encryption that we don't know?
(15-11-2025, 02:54 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.So is it possible at all to say 100% sure if the text is meaningless and not in some language or encryption that we don't know?
If it were completely meaningless it is impossible to prove it so.
My own hypothesis is that the text is a combination of meaningless filler text and enciphered real text. I guess that it is very approximately about 20% real text and 80% filler text. However suppose for example that it is 99% filler text and 1% real text or even that just one word in the whole manuscript is real and all the other words are filler it would be near impossible to prove that is the case.
(15-11-2025, 02:54 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.So is it possible at all to say 100% sure if the text is meaningless and not in some language or encryption that we don't know?

I think there is no practical way of doing this. This is why it's so easy to blanket dismiss all meaningless hoax theories as "mere conjectures". Unless you find an undeniably original signed affidavit from the author of the manuscript describing its specific characteristics in great detail and providing a reasonable account of why the meaningless manuscript was created, I don't think there ever will be a sufficiently strong argument for any meaningless MS theory. Even if the text was generated using some stochastic approach (and we reverse engineer this approach), all it shows is that a manuscript like this can be created using this approach, not that this manuscript was definitely created using this approach.
No, it is not possible to say it is nonsensical. There are too many noted patterns. Even if it wasn't a language it would be communicating a pattern. 

I think often those coming from a strictly mathematic, statistical or computer science background may have a flawed approach to viewing language that misses some of the important music, metric and nuance. (coming from a software engineer)
(15-11-2025, 03:34 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think there is no practical way of doing this. This is why it's so easy to blanket dismiss all meaningless hoax theories as "mere conjectures". Unless you find an undeniably original signed affidavit from the author of the manuscript describing its specific characteristics in great detail and providing a reasonable account of why the meaningless manuscript was created, I don't think there ever will be a sufficiently strong argument for any meaningless MS theory. Even if the text was generated using some stochastic approach (and we reverse engineer this approach), all it shows is that a manuscript like this can be created using this approach, not that this manuscript was definitely created using this approach.

I agree, and there's a good chance we'll be stuck in this limbo forever. There are many scenarios where the MS is meaningful but we are no longer able to recover that meaning; basically any scenario where additional, now lost information is required (like a codebook) or in the case of lossy/one-way ciphers. And I don't think we can ever distinguish the output of such scenarios from a hypothetical (stochastic) way of generating nonsense text.

Basically, the only case when we might ever be able to tell whether the MS is gibberish or not, is when it is meaningful and that meaning can still be recovered.

I also think what Mark proposes is attractive in some way: even if the MS is meaningless, there may still be actual words or sentences hidden in there. But this would only complicate matters, because it would mean that the creators of this system not only came up with a completely unique way of generating large amounts of cohesive gibberish governed by all kinds of rules, but they also snuck in a way to use the same system for enciphering actual text. So even though I like this solution on an intuitive level, I think it is not the most likely in practice.
What you say people mostly agrees with my vague intuition. It is close to impossible to prove that something is meaningless for sure.
Even if you write some simple pattern like 1,2,3,1,2,3,1,2,3,1,2,3... it could still mean metre for waltz or something like that and thus have meaning.

Some specific cases could be possible like reading the ciphered text and identifying nulls. That would be Mark's case.

Generally if I remember my lectures well, some categories of negative statements are harder to prove in philosophy that positive ones.
Try to prove that dogs exist. And then try to prove that dragons do not exist  Smile
There is a family of scenarios where someone provides a straightforward and convincing algorithm that generates large sections of the text in a straightforward way without imbuing it with meaning, and I can imagine computer methods that could recover it.

Two things about this, though. First, it would likely leave some ambiguity, especially if the seeds of the algorithm still held a tantalizing possibility for meaning so I don't know if the question would ever be universally viewed as settled. Second, I think this solution space is small so I mostly agree with the point that we're unlikely to prove the text is meaningless, but I did want to register that I don't think meaninglessness strictly implies insolvability.
(15-11-2025, 02:54 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.But if someone wanted to prove that VM is nonsensical, how would the definitive proof look like? Is it possible at all?

It would be possible if we can find is a simple deterministic algorithm that generates the whole text, either without any other input, or with a known input (like the Bible) that cannot carry any new message.  Then the only "meaning" of the text would be that algorithm and the choice of that auxiliary input.

If the algorithm is not deterministic, the sequence of bits that specifies the "random" choices made when generating the VMS would constitute non-trivial contents -- and those bits then could have any meaningful message in some unknown encoding.   Ditto if the algorithm is deterministic but requires more than a few bits of input data.  Like, if for each line of the VMS one must specify a verse of the Bible to be used as input to the algorithm.  Then the sequence of verse numbers could be a meaningful message in some code.

The T&T "self-copy and mutate" algorithm, for example, requires an input stream of bits to decide when+how to reset the copy-from pointer and whether+how to mutate each copied word.  Thus, even if we could prove that the VMS was generated by that algorithm, it would still have a non-trivial contents -- namely, that bit sequence, that could encode a meaningful message.

All the best, --stolfi
(15-11-2025, 04:15 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I also think what Mark proposes is attractive in some way: even if the MS is meaningless, there may still be actual words or sentences hidden in there. But this would only complicate matters, because it would mean that the creators of this system not only came up with a completely unique way of generating large amounts of cohesive gibberish governed by all kinds of rules, but they also snuck in a way to use the same system for enciphering actual text. So even though I like this solution on an intuitive level, I think it is not the most likely in practice.
It seems to me that the most likely scenario is that the author generated the real text for a page using an advanced substitution cipher key of the time, such as the 1424 Milanese cipher key that I have discussed elsewhere. I suspect that the author placed the words on the page with big gaps for later filler text to be added. I think the filler text words were generated partly by copying and modifying real words from that page to make new fake words, although I suppose they could also be copied from text from other pages, and also by using a stock of variants of standard Voynichese filler words the likes of which we are all familiar.

A reader with a copy of the cipher key could generate the underlying text and could easily spot the filler words by virtue of the fact that they don't spell geninue words in the language of the text whilst the "real" words do. So it should not present much problem for the reader or the writer of the text who possesses the cipher key, but considerable problem for someone without the cipher key.
(15-11-2025, 02:54 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.But my feeling is that these statistical methods won't give 100% answer it the text is meaningful or not.


Statistics never gives 100% proof. Hypothesis testing is about obtaining confidence levels. i.e. a confidence level of 95% that some observed effect cannot be down to randomness.

In the case of the VMS it can be difficult to try to use statistical hypothesis testing to give a definite answer to whether the text is meaningful or not. The text is a product of human endeavour and so most likely the writers ( the general belief being that there were more than one ) would have written in their personal and distinct styles, choosing to write with more carefulness on some days, in more of a hurry, with abandon on other days, being in a jocular mode on others. Each writer had a free hand to write what he liked. All this makes it difficult to know how best to apply statistics.

The role of statistics is not to prove but to provide evidence to tilt opinion towards a hypothesis. Instead of aiming for 100% proof search for what is logical, plausible.

If you are asking for ways that may suggest that the text is meaningless then one way would be to look for evidence of artificial construction. People have noted within the text long distance effects, vertical pair effects, and other anomalies that cannot be explained if the text were a continuous narrative in a natural language. These effects cannot easily be dismissed. May I trouble you also to read my post on splitting gallows words

You are not allowed to view links. Register or Login to view.

which shows that the initial and final word parts of gallows words are independent and likewise unlikely to be in a natural language. Hopefully this piece of evidence might tilt your opinion towards the hypothesis that the text is meaningless, a hoax.
Pages: 1 2 3