02-07-2025, 04:10 PM
The GPT model is a type of neural network trained to predict the next token (e.g., word or character) in a sequence, based on the context of the previous ones. During training, the model gradually learns patterns and structures that help it guess what might come next. When applied to natural language, this often results in learning the grammar and syntax of the language. The goal of this experiment is to see whether a GPT, when trained on Voynich text, can reproduce even short valid word sequences — a basic sign of underlying grammatical structure.
Using a minimal GPT architecture trained on 11,000-token corpora from natural languages and the Voynich manuscript (EVA and CUVA transcriptions, only paragraphs of what seems natural language (not cosmological parts, for example)), I evaluated how well the model could reproduce sequences of two or three consecutive words (bigrams and trigrams) from the original corpus. The results reveal stark differences between Voynichese and natural languages.
I trained several nanoGPT models (roughly 1.1M parameters each) on corpora limited to 11,000 words each. The corpora included:
Each model was trained on tokenized text split by the dot (".") separator, treating each token as a "word". Then, I prompted each model to generate 1000 words, starting from a random token from the original corpus.
For each generated sequence, I extracted all bigrams and trigrams and checked how many were present in the original corpus text (used as training data).
Results (Bigrams and Trigrams Found in Training Text):
![[Image: s5N9a0a.png]](https://i.imgur.com/s5N9a0a.png)
The Latin religious text In Psalmum David CXVIII had pretty low bigram and trigram scores — not too far from the Voynich numbers. This could be because of its complex sentence structure or how rarely some word combinations repeat. But even then, it still produced some consistent word sequences, which the GPT picked up.
That didn’t happen with the Voynich at all — no three-word sequences from the original text were ever regenerated. This makes Voynichese stand out as fundamentally different.
In addition, the entropy of word distributions was comparable across corpora (~8.5 to 9.6 bits), meaning the GPT learned the relative frequencies of words quite well. However, only in natural language corpora did it also learn statistically consistent co-occurrence patterns.
Conclusion:
If the Voynich manuscript encoded a natural language, we would expect a GPT trained on it to be able to reproduce at least a small proportion of common bigrams and trigrams from the training corpus. This is exactly what we observe in natural language corpora (e.g. Esperanto 25.9% bigram match). In contrast, the bigram match rate for Voynichese is nearly zero, and trigrams are entirely absent.
This strongly supports the hypothesis that the Voynich manuscript is not a natural language encoding. While it has an internally consistent lexicon (i.e., words), it lacks the sequential dependencies and word-to-word transitions that characterize even simple or constructed languages.
Implication:
If a small GPT can learn bigrams and trigrams from natural languages in just 11,000 words — but completely fails to do so with Voynichese — this suggests that the manuscript does not reflect natural language structure.
This casts serious doubt on claims of direct decryption or translation into real languages. It’s likely that such efforts are misapplied.
Instead, the Voynich may reflect a pseudo-linguistic system — a generative algorithm, a constructed gibberish, or even a cipher whose output was never meant to carry true semantic depth. The surface form may resemble language, but its internal statistical behavior tells a different story.
In short: be skeptical of anyone claiming to have “translated” the Voynich into English, Latin, or any other language — unless they can show that their version has the statistical fingerprints of a true linguistic system.
Using a minimal GPT architecture trained on 11,000-token corpora from natural languages and the Voynich manuscript (EVA and CUVA transcriptions, only paragraphs of what seems natural language (not cosmological parts, for example)), I evaluated how well the model could reproduce sequences of two or three consecutive words (bigrams and trigrams) from the original corpus. The results reveal stark differences between Voynichese and natural languages.
I trained several nanoGPT models (roughly 1.1M parameters each) on corpora limited to 11,000 words each. The corpora included:
- Latin (e.g. De Docta Ignorantia)
- Classical religious text (In Psalmum David CXVIII)
- Early Modern English (Romeo and Juliet)
- Esperanto (Alicie en Wonderland)
- Voynich EVA transcription
- Voynich CUVA transcription
Each model was trained on tokenized text split by the dot (".") separator, treating each token as a "word". Then, I prompted each model to generate 1000 words, starting from a random token from the original corpus.
For each generated sequence, I extracted all bigrams and trigrams and checked how many were present in the original corpus text (used as training data).
Results (Bigrams and Trigrams Found in Training Text):
![[Image: s5N9a0a.png]](https://i.imgur.com/s5N9a0a.png)
The Latin religious text In Psalmum David CXVIII had pretty low bigram and trigram scores — not too far from the Voynich numbers. This could be because of its complex sentence structure or how rarely some word combinations repeat. But even then, it still produced some consistent word sequences, which the GPT picked up.
That didn’t happen with the Voynich at all — no three-word sequences from the original text were ever regenerated. This makes Voynichese stand out as fundamentally different.
In addition, the entropy of word distributions was comparable across corpora (~8.5 to 9.6 bits), meaning the GPT learned the relative frequencies of words quite well. However, only in natural language corpora did it also learn statistically consistent co-occurrence patterns.
Conclusion:
If the Voynich manuscript encoded a natural language, we would expect a GPT trained on it to be able to reproduce at least a small proportion of common bigrams and trigrams from the training corpus. This is exactly what we observe in natural language corpora (e.g. Esperanto 25.9% bigram match). In contrast, the bigram match rate for Voynichese is nearly zero, and trigrams are entirely absent.
This strongly supports the hypothesis that the Voynich manuscript is not a natural language encoding. While it has an internally consistent lexicon (i.e., words), it lacks the sequential dependencies and word-to-word transitions that characterize even simple or constructed languages.
Implication:
If a small GPT can learn bigrams and trigrams from natural languages in just 11,000 words — but completely fails to do so with Voynichese — this suggests that the manuscript does not reflect natural language structure.
This casts serious doubt on claims of direct decryption or translation into real languages. It’s likely that such efforts are misapplied.
Instead, the Voynich may reflect a pseudo-linguistic system — a generative algorithm, a constructed gibberish, or even a cipher whose output was never meant to carry true semantic depth. The surface form may resemble language, but its internal statistical behavior tells a different story.
In short: be skeptical of anyone claiming to have “translated” the Voynich into English, Latin, or any other language — unless they can show that their version has the statistical fingerprints of a true linguistic system.