The Voynich Ninja

Full Version: What Lies Beneath: Statistical Structure in Voynichese Revealed by Transformers
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
(08-06-2025, 05:31 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Hi,

Thanks a lot for your message. I find your suggestion about grammar-based decomposition really interesting — that’s exactly the kind of direction I’d like to explore next.

To clarify what I’ve done so far:

I trained several small GPT models on Voynichese using a character-level tokenizer, based on the Currier transliteration. This worked quite well: the model was able to predict character sequences with low perplexity (~3.3), which suggests a high degree of internal structure.

If I understood, the GPT models learn and build an internal representation of the Voynich word 'grammar', then this is used to generate new words which are compared to the original and from this comparison a 'perplexity' score is calculated, which gives a measure of how good the newly generated words with respect to actual Voynich (and I guess the same score is used while training the model?). Did I get it about right?


(08-06-2025, 05:31 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In contrast, using word-level tokenization (based on dot-separated EVA words) gave very poor results — mainly because of the large vocabulary size and lack of training data per token.
At this point, I’m considering two directions:

  1. Trying more modern transliterations (like Takahashi or updated EVA versions). But I’m a bit concerned that these are too detailed — they distinguish rare glyphs very precisely, which might make it harder for the model to generalize.

This looks a bit problematic to me, because Currier made many choices in his transliteration (while EVA is much more 'agnostic') which are bound to influence your results. I'm not saying that grouping EVA characters together is necessarily a bad thing (I myself always grouped together "ch", "sh", "cph", "cfh", "cth", "ckh") and I'm not saying Currier made bad choices, but personally I'd rather use a more 'raw' transliteration in EVA. 

(08-06-2025, 05:31 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
  1. Switching to syllable-like units instead of characters — which is exactly what you suggested.
    I’d love to hear your opinion on this:
  • What kind of syllables (e.g. "qo", "dy", "aiin", etc.) do you think would make sense?

Eh.. that is the question that a grammar is meant to solve  Smile !

(08-06-2025, 05:31 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
  • Which transliteration would be the best basis for such tokenization?

I don't think it matters much, but should you decide to try EVA, I suggest the Reference transliteration by Renè Zandbergen (sorry but I can never find the link...).

(08-06-2025, 05:31 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.If you’ve already explored this type of decomposition, I’d be really interested in hearing more or comparing approaches.

Thanks again for your input!

I experimented with a decomposition based on slot grammars. If you're interested you can check here: You are not allowed to view links. Register or Login to view. . I don't want to hijack your thread, if you have questions send me a PM.
(09-06-2025, 10:52 AM)Mauro Wrote: You are not allowed to view links. Register or Login to view.This looks a bit problematic to me, because Currier made many choices in his transliteration (while EVA is much more 'agnostic') which are bound to influence your results. I'm not saying that grouping EVA characters together is necessarily a bad thing (I myself always grouped together "ch", "sh", "cph", "cfh", "cth", "ckh") and I'm not saying Currier made bad choices, but personally I'd rather use a more 'raw' transliteration in EVA.

This feels to me like one of the most pervasive misconceptions in modern Voynich research, up to the academic level. EVA makes just as many choices as Currier did, and one is not necessarily closer to the truth than the other. It is up to the person running the test to be aware of the fact that, no matter which system they use, some changes may be required.
(09-06-2025, 11:11 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This feels to me like one of the most pervasive misconceptions in modern Voynich research, up to the academic level. EVA makes just as many choices as Currier did, and one is not necessarily closer to the truth than the other.

I surely agree with you! Would then be okay to say EVA is more 'analytic' in its choices while Currier is more 'synthetic'?
(09-06-2025, 12:14 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.
(09-06-2025, 11:11 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This feels to me like one of the most pervasive misconceptions in modern Voynich research, up to the academic level. EVA makes just as many choices as Currier did, and one is not necessarily closer to the truth than the other.



I surely agree with you! Would then be okay to say EVA is more 'analytic' in its choices while Currier is more 'synthetic'?



Thanks for the comments.

To add a practical perspective: I’ve now trained the same GPT model using the Zandbergen EVA transliteration, and the results are remarkably close to what I obtained with Currier:

- Average validation loss: 1.2651

- Perplexity: 3.54

So from a modeling perspective, both systems encode a similarly learnable structure — neither seems clearly “closer to the truth,” at least in terms of predictive consistency.


That said, something interesting appeared when comparing the two heatmaps of average word loss per folio: both show a fairly similar overall pattern. However, the Z-L EVA model includes data for folio You are not allowed to view links. Register or Login to view. — the famous “wheel with text in the spokes” page often described as astronomical, cosmological, or magical — while the Currier model is unable to analyze this folio. Despite this difference, both models produce folio-by-folio perplexity values that are quite similar in absolute terms. Some folios show higher perplexities than average, others lower, but the distribution between the two heatmaps is comparable.


[img][Image: Rca54Xf.png][/img]



So even though EVA and Currier make different choices (and yes — I like the idea that EVA is more analytic while Currier is more synthetic), the internal structure captured by the model remains stable across systems.

The differences may lie more in how each system segments glyphs and sequences — which could affect how we interpret specific outliers like f57r.

More to explore here!


Yes, you got it mostly right! GPT models learn patterns and rules (like a kind of “grammar”) from the Voynich text. They use this knowledge to predict or generate new words. Then, by comparing these generated words to the actual Voynich words, the model calculates a “perplexity” score, which shows how well it understands the text. Lower perplexity means better understanding. This score is also used during training to improve the model step by step.
(09-06-2025, 01:02 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Yes, you got it mostly right! GPT models learn patterns and rules (like a kind of “grammar”) from the Voynich text. They use this knowledge to predict or generate new words. Then, by comparing these generated words to the actual Voynich words, the model calculates a “perplexity” score, which shows how well it understands the text. Lower perplexity means better understanding. This score is also used during training to improve the model step by step.

I'm not sure if it's nitpicking or not, but I don't think the low perplexity here can be called "understanding" in any sense. The perplexity is just the (anti-)confidence score when predicting new tokens. I can train a model on the following sequence: "You are not allowed to view links. Register or Login to view." and the perplexity will be extremely low, because it's very easy to predict next tokens from past tokens. This doesn't mean the model would have any idea about the semantics of this phrase or could correctly predict anything in "Chicago cats Denver dogs bully bully Minnesota mice".
You're absolutely right to point that out — thanks for the clarification. You're correct that low perplexity doesn’t imply “understanding” in a human or semantic sense. It just means the model finds the sequence predictable based on what it has seen during training. In that sense, it reflects how well the model has captured the surface patterns or structure — not necessarily the meaning — of the Voynich text. I should have said something like “internal consistency” rather than “understanding”.
Pages: 1 2