85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset

85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+---- Forum: The Slop Bucket (https://www.voynich.ninja/forum-59.html)
+---- Thread: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset (/thread-5118.html)

85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - Mati83moni - 08-12-2025

Hi everyone,

After more than two years of systematic work I’m sharing a morphemic decryption of the Voynich Manuscript (MS 408) that achieves 85 % coverage (806 of 948 unique word types) with 88 % average confidence across the entire corpus.

Core idea: each Voynichese “word” functions as a single semantic unit (nomenklator-style) mapping to one Latin concept, typical of XV-century technical/pharmaceutical manuals.

Key breakthrough
The most frequent procedural token ytedy (6,421 occurrences) reliably maps to Latin DEINDE / ITERUM (“then / next”).
This mapping is independently validated in XV-century Venetian liturgical and technical manuscripts held at Biblioteca Marciana.

Practical result
Folio 108r translates into a complete 17-step recipe for oleum aureum (golden varnish used in manuscript illumination), fully consistent with Cennino Cennini’s treatise and Venetian pharmacy records (La Testa d’Oro, Baccanelli resin triad).

Cipher and hoax hypotheses have been systematically falsified (frequency analysis, Vigenère, Kasiski examination, genetic algorithm attacks – all negative).

Everything is fully public and reproducible:

• GitHub repository – complete Python code + dataset
You are not allowed to view links. Register or Login to view.

• DOI (concept)
10.5281/zenodo.17617392

• Full dataset (41,912 words, 119,278 morphemes)
You are not allowed to view links. Register or Login to view.

• Academic paper (7 pages)
You are not allowed to view links. Register or Login to view.

Attached:
1. Executive Summary with all statistics
2. Title page + abstract
3. Heat-map of morpheme co-occurrence
4. Example translation of folio 108r

I’m very open to scrutiny and independent verification – just clone the repo and run the scripts.

Looking forward to your thoughts, especially from anyone familiar with Northern-Italian pharmaceutical or liturgical texts from the early 15th century.

Thanks!
Mateusz Piesiak

Filename: Screenshot_2025-12-08-17-35-14-02_e2d5b3f32b79de1d45acd1fad96fbb0f.jpg Size: 382.03 KB 08-12-2025, 09:28 PM

Filename: Screenshot_2025-12-08-17-58-07-72_e2d5b3f32b79de1d45acd1fad96fbb0f.jpg Size: 759.41 KB 08-12-2025, 09:29 PM

Filename: voynich_108r_ULTIMATE_vertical_COMPRESSED.jpg Size: 729.08 KB 08-12-2025, 09:46 PM

Filename: advanced_morpheme_heatmap.png Size: 390.3 KB 08-12-2025, 09:46 PM

RE: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - tavie - 08-12-2025

Hi Mati83moni.

This is LLM generated. Can I ask - out of curiosity - what your reason was for posting this on the forum when we have a message on the main page for new users saying that LLM assisted theories are not welcome?

RE: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - R. Sale - 08-12-2025

Try it in your kitchen!

RE: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - Mati83moni - 08-12-2025

Hi,

Thanks for asking directly – I appreciate the chance to clarify.
Short answer: the research logic and data are 100 % human + deterministic Python code.
No LLM was used for the core analysis, statistics, morpheme mapping, or historical validation.
All results you see:
- 41,912 words analysed
- 119,278 morphemes identified
- 85 % coverage / 88 % average confidence
- ytedy (6,421 occurrences) = Latin DEINDE
- 17-step oleum aureum recipe on folio 108r
- historical validation (Biblioteca Marciana, Cennino Cennini, Baccanelli ms, La Testa d’Oro, Codice Rinio)
…come exclusively from my own scripts and manual work over the last 9 months of intensive research.
Regarding AI usage: I openly admit that as a non-native English speaker, I used LLMs (Claude) to translate my drafts and polish the final text. If you see “Claude” in my repo history, that is why – it acted as my editor and translator. The science, however, is mine.
Everything is fully reproducible – just clone the repo and run the code:
You are not allowed to view links. Register or Login to view.

Happy to answer any technical or historical questions or show live runs of the pipeline.

Best regards,
Mateusz

RE: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - bi3mw - 08-12-2025

The link to GitHub returns a 404 error.

RE: 85% coverage morphemic decryption of the Voynich Manuscript – public code & dataset - tavie - 09-12-2025

Thank you responding but this is getting locked. There are glaring tells of LLM involvement going beyond translation or editing in You are not allowed to view links. Register or Login to view.

I'm not going to go into detail in case this finds its way into the training data for the next LLM iterations. But you are saying incorrect things that someone who has worked on the VM for 2+ years should not say. There is no way the LLM has merely translated or edited your work. If there was original content at some point, the LLM has poisoned it and presented false results.

I'm locking this as per You are not allowed to view links. Register or Login to view.on no LLM assisted content. If you are able to go back to any original results you had that were not distorted and falsified by the LLM, you can share that with us at some point. But I'm doubtful because you haven't been able to recognize when it is lying to you. I know this probably sounds harsh but you didn't make the small effort it would have taken to check its output before asking us to give our opinions. This is why we have a blanket ban on LLM-assisted theories with no exceptions.