KamCode > 7 hours ago
(8 hours ago)Typpi Wrote: You are not allowed to view links. Register or Login to view.(8 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.(8 hours ago)Typpi Wrote: You are not allowed to view links. Register or Login to view.(11 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.Please read the paper and the github i
The GitHub is 99% AI slop.
Can you just summarize your points and research here?
The GitHub is extremely verbose with little substance.
yeah the github readme is ai generated, alot of this project aswell as the Voynich manuscript is how to use ai effectively as a tool and not a chatbot, guilty. ill clean it up.
short version:
i think voynich is medieval sinhala (elu) written phonetically. i visited sri lanka, noticed old sinhala letterforms look like voynich glyphs. built a decoder that maps EVA characters to 14 sinhala phonemes. about 200 lines of python.
the evidence that its not random:what it doesnt do: produce fluent readable sinhala. dictionary hit rate is around 50%. entropy doesnt reach natural language levels. i need a medieval sinhala scholar to evaluate the output properly.
- decoded keywords cluster by manuscript section. plant words on plant pages, preparation terms on balneo pages. the decoder doesnt know which section its on. Z=31.81 against random shuffles, none of 1000 shuffles came close.
- EVA reads better right to left, decoded text reads better left to right. sinhala is left to right. the encoding flips directionality which is what you'd expect from an abugida.
- cross-modal: decoder outputs medical terms that match what the illustrations show, spatially. it cant see the pictures.
- ablation: remove rules from the decoder and results degrade. random decoders dont reproduce the clustering.
the decoder itself is in h12_decoder.py if anyone wants to skip the readme and just look at the code.
pick a couple of folios i can put them through the decoder and give you the raw output
That's much easier to follow.
Can you decode this?
Typpi > 7 hours ago
KamCode > 7 hours ago
(7 hours ago)Typpi Wrote: You are not allowed to view links. Register or Login to view.(7 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.the decoder doesnt know which section its on. Z=31.81 against random shuffles, none of 1000 shuffles came close.
I thought the decoded didn't need to know what page it was on?
eggyk > 7 hours ago
(Yesterday, 04:01 PM)KamCode Wrote: You are not allowed to view links. Register or Login to view.the text was probably originally written on palm leaves which is why you see very few straight lines in the script — straight lines crack the leaf.
(8 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.the dictionary is modern sinhala, not medieval. thats a fair weakness to point out. medieval elu vocabulary would be different and i havent been able to source a proper historical elu dictionary yet. thats one reason i want a sinhala linguist involved, someone who knows the old forms.
KamCode > 7 hours ago
KamCode > 7 hours ago
(7 hours ago)eggyk Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 04:01 PM)KamCode Wrote: You are not allowed to view links. Register or Login to view.the text was probably originally written on palm leaves which is why you see very few straight lines in the script — straight lines crack the leaf.
q , k , t , p ch
The text is full of straight lines, with the most notable characters like gallows explicitly being made from long, straight vertical lines.
(8 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.the dictionary is modern sinhala, not medieval. thats a fair weakness to point out. medieval elu vocabulary would be different and i havent been able to source a proper historical elu dictionary yet. thats one reason i want a sinhala linguist involved, someone who knows the old forms.
The dictionary "sinhala_dictionary.txt", how confident are you that it's correct? The first words listed are:
"a, aa, aaa, aaaa, aaaaa, aaaaaa, aaaaaaaa, aaaah"
It doesn't give a lot of confidence. I don't mind using AI for certain tasks, but are you accepting AI output/input as fact and running with it?
eggyk > 7 hours ago
(7 hours ago)KamCode Wrote: You are not allowed to view links. Register or Login to view.hi sorry for not being clear i am saying the author originally would of made there notes from spoken word onto palm leaves, i believe it was copied from those to vellum some time later months/years to preserve them, so originally it would of been no straight lines mainly loops, the text now is a hybrid, also ill explain the dicitonary below its easier to take the explanation from ai as alot to retype in my own words, please dont mind
The dictionary: Fair challenge. The entries starting with "a, aa, aaa, aaaa, aaaah" look like junk but they're mostly not — "aaabharana" is ābharaṇa (ornament), "aaachaarya"
is ācārya (teacher), "aaadharana" is ādhāraṇa (support). The triple-a is a romanization artifact: "aa" represents long ā, so a word starting with ā + another a-initial
syllable becomes "aaa." There are about 269 entries starting with "aaa" out of 1,470,278 total — less than 0.02%. The first handful ("a, aa, aaa, aaaa, aaaaaaaa, aaaah") are
genuine noise — 32 entries total of repeated single characters in the entire file.
The dictionary is a computationally-derived modern Sinhala romanized wordlist with full morphological inflections (beheth, behetha, behethak, behethaka, behethakata... =
medicine + case/number suffixes). It contains real Sinhala vocabulary including medical terms (roga, aushadha, vaidya, kashaya, churna, ghrita, sneha). It is not AI-generated.
That said — the dictionary being modern is a real limitation, as I acknowledged. And you're right to push on data quality. I should document the dictionary source explicitly
in the paper. It currently doesn't say where it came from and it should.
thanks i will be offline till the morning but happy to answer more and any questions then thanks
pjburkshire > 7 hours ago
KamCode > 6 hours ago
(7 hours ago)pjburkshire Wrote: You are not allowed to view links. Register or Login to view.I would like to see Quire 1, f2v. Most people call this the waterlily page.
pjburkshire > 6 hours ago