The Voynich Ninja

Full Version: Please help :-) Ablation-tested phonological decoder (EVA → Elu-Sinhala)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
hi yeah sorry about the last post, ive taken advice and changed this one let me try again properly
basically i think its a medical book, shorthand notes taken by a student from spoken instructions. like a pharmaceutical guide passed down, possibly mother to daughter. i visited sri lanka during rehabilitation and saw a lot of similarities between old sinhala (elu) letterforms and the voynich glyphs. temples with big communal baths outside, temple mandalas, the whole feel of it clicked.
i think its focused on womens health which would explain the bathing sections and the female figures throughout. the text was probably originally written on palm leaves which is why you see very few straight lines in the script — straight lines crack the leaf. at some point later someone in europe copied it onto vellum.
a lot of the plant drawings look like they were done from pressed specimens or memory rather than live plants, which would explain the flat leaves and weird proportions. thats what you'd expect from someone drawing plants they trained with rather than ones sitting in front of them.
theres a full set of statistical tests on my github supporting this, section clustering, directionality analysis, ablation studies etc. claude opus was used to help design the tests and write the python but the outputs come from code not from ai making stuff up. my background is in deeptech and llm architecture so i know the difference.
please have a look at the paper and github if you get a chance, i am completely open to criticism. i am not saying this is definitely right, i want the community to prove it right or wrong.
cheers

Resources:





[*]Code + data: You are not allowed to view links. Register or Login to view.
[*]Full mapping table and ablation scripts included in repo


Break it if you can PLEASE.
This reads like AI wrote the post. How did you use an LLM in this project?
(8 hours ago)tavie Wrote: You are not allowed to view links. Register or Login to view.This reads like AI wrote the post. How did you use an LLM in this project?

hey i did use ai to write this post, i am not used to groups like this, i used clause code to build testing tools, theory is mine from when i went to Sri lanka, i am looking for people to help me prove or disprove it, all the files and results are on my github for anyone to test and view, thanks for your reply
Hello Kameldip!
Try presenting your idea without Claude; it would probably be easier for readers to understand.
(8 hours ago)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.Hello Kameldip!
Try presenting your idea without Claude; it would probably be easier for readers to understand.

hi yeh sorry the main points are its a medical book, shorthand notes taken by a student, from spoken instructions, like a pharmacutical guide from a mother to a daughter perhaps, i visited sri lanka and seen alot of similarities between old sinhala (elu) letterforms and the manuscript, temples with big baths outside, temple mandlas, etc

That led me to pursue that route, it could be a medical book focused on womans health hence the link between mother and daughter, i believe it was written on palm leaves hence the stlye of writing few straight lines as that would break the leaf, at a later time in Europe the author converted it, i believe alot of the images are drawn from specimens the author would of seen during his/her training hence there look, flat leaves in accurate plant scales etc 

theres many tests that have been done to support this fully available on my github, Claude opus 4.6 was used to design these tests, the outputs were not determined by A.I it was python code, ai assisted in methodology investigation and design alongside me, my background is deeptech and llm architecture,

Please read the paper and the github if possible i am open to any and all comments i am not saying this is the Final result and theres no chance i am wrong i want the community to prove it right or wrong,

thank you
Welcome to the forum.

Dictionary match, grammar match, phonotactics match: impressive if true.

I see the word list is in sinhala_dictionary.txt, where does it come from? Does it contain 15th century or modern vocabulary?

How good is the coverage of the most common Sinhalese words that should almost certainly exist in the VMS?

Quote:The recipe section (folios 75-116, Quire 20) comprises 81 folios containing 22,783 word tokens.
At the current coverage level:
Tier 1: English gloss available 20,936 = 91.9%
Tier 2: Sinhala dictionary match 1,026 = 4.5%
Tier 3: Edit-distance-1 match 639 = 2.8%
Tier 4: Unknown 182 = 0.8%

What is the "English gloss"? Why such a low percentage in the Sinhala dictionary?

Note: the "Recipes" section, Q20, actually starts at folio 103 (12 folios).
(7 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.Please read the paper and the github i

The GitHub is 99% AI slop.

Can you just summarize your points and research here?

The GitHub is extremely verbose with little substance.
thanks for the welcome and good questions
the dictionary is modern sinhala, not medieval. thats a fair weakness to point out. medieval elu vocabulary would be different and i havent been able to source a proper historical elu dictionary yet. thats one reason i want a sinhala linguist involved, someone who knows the old forms.
the tier breakdown is a bit confusing the way its written, let me clarify. the english gloss is the decoder's own internal lookup — when it maps EVA to phonemes it has a built-in table that assigns english meanings to common output strings. so that 91.9% means the decoder recognised the output and could label it. the 4.5% sinhala dictionary match is an independent check against an external dictionary, which is much stricter. so theres a gap there that needs honest acknowledgment — the decoder labels its own outputs confidently but external validation is lower.
the coverage of common words is something i'd like to test properly. if anyone has suggestions on what a baseline sinhala word frequency list should look like for a pharmaceutical text id be interested.
and thanks for the correction on Q20 starting at folio 103, ill fix that
(5 hours ago)Typpi Wrote: You are not allowed to view links. Register or Login to view.
(7 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.Please read the paper and the github i

The GitHub is 99% AI slop.

Can you just summarize your points and research here?

The GitHub is extremely verbose with little substance.


yeah the github readme is ai generated, alot of this project aswell as the Voynich manuscript is how to use ai effectively as a tool and not a chatbot,  guilty. ill clean it up.
short version:
i think voynich is medieval sinhala (elu) written phonetically. i visited sri lanka, noticed old sinhala letterforms look like voynich glyphs. built a decoder that maps EVA characters to 14 sinhala phonemes. about 200 lines of python.
the evidence that its not random:
  • decoded keywords cluster by manuscript section. plant words on plant pages, preparation terms on balneo pages. the decoder doesnt know which section its on. Z=31.81 against random shuffles, none of 1000 shuffles came close.
  • EVA reads better right to left, decoded text reads better left to right. sinhala is left to right. the encoding flips directionality which is what you'd expect from an abugida.
  • cross-modal: decoder outputs medical terms that match what the illustrations show, spatially. it cant see the pictures.
  • ablation: remove rules from the decoder and results degrade. random decoders dont reproduce the clustering.
what it doesnt do: produce fluent readable sinhala. dictionary hit rate is around 50%. entropy doesnt reach natural language levels. i need a medieval sinhala scholar to evaluate the output properly.
the decoder itself is in h12_decoder.py if anyone wants to skip the readme and just look at the code.

pick a couple of folios i can put them through the decoder and give you the raw output
(5 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.
(5 hours ago)Typpi Wrote: You are not allowed to view links. Register or Login to view.
(7 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.Please read the paper and the github i

The GitHub is 99% AI slop.

Can you just summarize your points and research here?

The GitHub is extremely verbose with little substance.


yeah the github readme is ai generated, alot of this project aswell as the Voynich manuscript is how to use ai effectively as a tool and not a chatbot,  guilty. ill clean it up.
short version:
i think voynich is medieval sinhala (elu) written phonetically. i visited sri lanka, noticed old sinhala letterforms look like voynich glyphs. built a decoder that maps EVA characters to 14 sinhala phonemes. about 200 lines of python.
the evidence that its not random:
  • decoded keywords cluster by manuscript section. plant words on plant pages, preparation terms on balneo pages. the decoder doesnt know which section its on. Z=31.81 against random shuffles, none of 1000 shuffles came close.
  • EVA reads better right to left, decoded text reads better left to right. sinhala is left to right. the encoding flips directionality which is what you'd expect from an abugida.
  • cross-modal: decoder outputs medical terms that match what the illustrations show, spatially. it cant see the pictures.
  • ablation: remove rules from the decoder and results degrade. random decoders dont reproduce the clustering.
what it doesnt do: produce fluent readable sinhala. dictionary hit rate is around 50%. entropy doesnt reach natural language levels. i need a medieval sinhala scholar to evaluate the output properly.
the decoder itself is in h12_decoder.py if anyone wants to skip the readme and just look at the code.

pick a couple of folios i can put them through the decoder and give you the raw output

That's much easier to follow.

Can you decode this?

[attachment=14055]
Pages: 1 2 3