The Voynich Ninja - Tested K&A at scale, found something, need your help

Pages: 1 2 3

Hello everyone,

I'm a CTO at a tech company. I build AI agents for businesses. Not a medievalist, not a cryptographer, not a Latinist.

Like many of you, the Voynich became an obsession. I spent weeks building a pipeline that tests the King-Andrisani transliteration hypothesis by checking every decoded word against the Perseus Latin Dictionary (265,419 attested forms). Not interpretation, code. Four versions thrown away before anything worked.

The key breakthrough: the scribe appears to glue prepositions to the following word, like Arabic proclitics. When I coded that rule, validation jumped from 74% to 89% in one pass, and four-word matches against pharmaceutical corpora went from 1 to 19.

What I find hardest to dismiss as artifact:

On f103r, the word "coque" (cook) appears 17 times in 5 conjugated forms: coque, coquas, coquere, coquendo, coquant. A random mapping does not produce a Latin morphological paradigm.

On f33r, the pipeline decodes INELIODE. The illustration on the same page shows an Asteraceae. The pipeline cannot see the illustration. Two independent channels pointing to Inula helenium.

The astronomical pages (f67r) decode to pharmaceutical vocabulary: spikenard, cinnamon, celery, wine. Nobody expected recipes hidden in star diagrams.

What doesn't work: 3,421 words are opaque. Zodiac labels are uncracked. Never found 5 consecutive words in a known text. 4 Aurea Alexandrina ingredients are missing. Short Latin words can match Perseus by chance, and I honestly cannot separate signal from noise in the 89%.

I don't have the medieval Latin expertise to evaluate grammar coherence. A Latinist would see in minutes what I can't see in weeks.

I've pushed this as far as I can. Everything is open source. My goal is to transmit this to someone with the right skills.

Pipeline + all 226 folios decoded: You are not allowed to view links. Register or Login to view.
Visual summary (22 pages): in the docs/conference folder
Paper: You are not allowed to view links. Register or Login to view.

Guillaume[attachment=15037]

Hi and welcome!

I'm not sure I agree that the findings as listed are hard to dismiss as artifacts. Please correct me if I'm wrong, but:

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.On f103r, the word "coque" (cook) appears 17 times in 5 conjugated forms: coque, coquas, coquere, coquendo, coquant. A random mapping does not produce a Latin morphological paradigm.

This is not a random mapping, as far as I understand, this is some mapping optimized using some Latin corpus or dictionary, which would contain different word forms.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.On f33r, the pipeline decodes INELIODE. The illustration on the same page shows an Asteraceae. The pipeline cannot see the illustration. Two independent channels pointing to Inula helenium.

This can easily be a coincidence. Among some 50000 words one matches one of hundreds of images.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.The astronomical pages (f67r) decode to pharmaceutical vocabulary: spikenard, cinnamon, celery, wine. Nobody expected recipes hidden in star diagrams.

In other words, some vocabulary not related to the images. Again, likely a coincidence.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Never found 5 consecutive words in a known text.

This is important, because this is a clear indicator that the mapping is likely spurious.

Thank you, these are exactly the critiques I need. You're right on several points.

The conjugation paradigm: the pipeline is optimized against a Latin dictionary, so it favors Latin forms by construction. I overstated this. The real test is whether the EVA source tokens share morphological structure, not just whether the output looks Latin.

INELIODE: fair point. With ~50,000 words and hundreds of illustrations, a single plant match could be coincidence. I need to calculate the false match probability. Haven't done it yet.

Pharmaceutical vocabulary on astronomical pages: you're right, if the decoder produces pharmaceutical words everywhere, finding them on astronomical pages is less meaningful, not more.

No 5-word match: your strongest point, my weakest result. Clear red flag.

I'm running a V2 validation suite right now, including word length analysis (our decoded words average 3.6 chars vs 5.6 for real Latin medical text, which is a problem), shuffled controls, and false positive baselines. Early results confirm some of your concerns. I'll share them here when ready.

This is exactly why I posted here !

(09-04-2026, 02:58 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Hi and welcome!

I'm not sure I agree that the findings as listed are hard to dismiss as artifacts. Please correct me if I'm wrong, but:

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.On f103r, the word "coque" (cook) appears 17 times in 5 conjugated forms: coque, coquas, coquere, coquendo, coquant. A random mapping does not produce a Latin morphological paradigm.

This is not a random mapping, as far as I understand, this is some mapping optimized using some Latin corpus or dictionary, which would contain different word forms.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.On f33r, the pipeline decodes INELIODE. The illustration on the same page shows an Asteraceae. The pipeline cannot see the illustration. Two independent channels pointing to Inula helenium.

This can easily be a coincidence. Among some 50000 words one matches one of hundreds of images.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.The astronomical pages (f67r) decode to pharmaceutical vocabulary: spikenard, cinnamon, celery, wine. Nobody expected recipes hidden in star diagrams.

In other words, some vocabulary not related to the images. Again, likely a coincidence.

(09-04-2026, 02:25 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Never found 5 consecutive words in a known text.

This is important, because this is a clear indicator that the mapping is likely spurious.

The pipeline produces a real signal above random (+67 percentage points). Words of 4-5 letters like aquam, coque, hiera, aloes, recipe validate strongly and cannot be explained by chance.

But the "89%" is misleading. 75% of our corpus is short words (<5 chars) that match Latin trivially. At 6+ characters, validation drops to 28%. At 7+, to 17%.

Zero 5-word sequences found in any medieval corpus. Shuffling word order changes almost nothing. We have a pharmaceutical vocabulary, not Latin sentences.

The honest picture: ~20 recurring pharmaceutical terms show a genuine signal. Everything else is noise or trivially short matches. Whether those 20 terms reflect real encoded Latin or a bias in the K&A mapping is the open question.

Hi Corwin!
If I understand correctly, d=y, aiin = aquam, and kaiin = curam?
I admit it will be difficult to surpass Claude.

(09-04-2026, 03:30 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.Hi Corwin!
If I understand correctly, d=y, aiin = aquam, and kaiin = curam?
I admit it will be difficult to surpass Claude.

Hi! Yes exactly, that's the v12 output: d→in, y→in, aiin→aquam, kaiin→curam.

Since posting I've been running harder validation tests and the glyph-by-glyph decoding is actually the weakest part.

So the individual letter values are shaky but the architecture is real. Errors are part of the process, that's what validation tests are for!

I'm tired and deeply sorry TBH

Hi Guillaume. Welcome to the forum.

I'm afraid we have a rule that prohibits theories and papers assisted by AI LLMs like Claude. This is because we've found that their tendency towards hallucinations is extremely at play when it comes to the Voynich. Our warning appears on the main page before you register for an account.

I appreciate you are an expert in AI but I have had at least two such people argue with me that their patently AI slop paper could not possibly be AI slop because they are experts.

This is why we have it as a blanket rule.

(09-04-2026, 03:42 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.Hi Guillaume. Welcome to the forum.

I'm afraid we have a rule that prohibits theories and papers assisted by AI LLMs like Claude. This is because we've found that their tendency towards hallucinations is extremely at play when it comes to the Voynich. Our warning appears on the main page before you register for an account.

I appreciate you are an expert in AI but I have had at least two such people argue with me that their patently AI slop paper could not possibly be AI slop because they are experts.

This is why we have it as a blanket rule.

Hi Tavie,
Message received. I’m kicking myself, I read the warning, but I let the AI "accelerate my own stupidity" instead of staying sharp. It was a lapse in judgment and I sincerely apologize.
I’m not here to argue; it’s my fault, not the tool's. I’m going to stay quiet now, listen, and learn from the community. You have my word that nothing will be posted again without rigorous human vetting.
Please don't ban me for this : lesson learned.
Best...

Hi Guillaume, welcome.

I've read a few files on your GitHub repository.

Your mapping is well optimized to generate Latinish words, but not yet at the level of a "Lorem Ipsum" generator (fake Latin, but not entirely fake). Too many 2,3,4,5-letter nonsensical words, far too many "eius".

I haven't used the Perseus Latin Dictionary, but 265.000 forms is quite small and (very probably) only covers classical Latin, not medieval Latin (words and variant spellings). For example late medieval Latin words rarely had any "ae" in them, with few exceptions, like "aer". Instead of "ae", the e cedilla "ȩ" was getting reintroduced during the Quattrocento in Italy, by the Humanists, but everyone else wrote only "e". So words like "cerae", "cassiae", "aeque" and "aequi" are not plausible if the text was wriiten in the first half of the 15th century.

BTW your data/dictionaries/latin.txt is actually English.

(10-04-2026, 05:09 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Hi Guillaume, welcome.

I've read a few files on your GitHub repository.

Your mapping is well optimized to generate Latinish words, but not yet at the level of a "Lorem Ipsum" generator (fake Latin, but not entirely fake). Too many 2,3,4,5-letter nonsensical words, far too many "eius".

I haven't used the Perseus Latin Dictionary, but 265.000 forms is quite small and (very probably) only covers classical Latin, not medieval Latin (words and variant spellings). For example late medieval Latin words rarely had any "ae" in them, with few exceptions, like "aer". Instead of "ae", the e cedilla "ȩ" was getting reintroduced during the Quattrocento in Italy, by the Humanists, but everyone else wrote only "e". So words like "cerae", "cassiae", "aeque" and "aequi" are not plausible if the text was wriiten in the first half of the 15th century.

BTW your data/dictionaries/latin.txt is actually English.

Hi nablator and thank you !

Your are totally right there is an idea behind that but for now I measure how dumb I was and double check every hypothesis.

See you sonn if I'm not going cracy...Since when are you on this ? What is your best guess ?

Pages: 1 2 3