26-02-2026, 01:46 AM
Hi everyone, I'm excited to be here. Let me be upfront, I have used AI to help me write this post as I want to ensure that what I've done is clearly communicated and understood here. But please do not just assume it's AI slop, I've worked hard on this and I am happy to share the extensive, rigorous testing that's gone into getting to this point.
I am introducing a new computational tool to the community. I want to be clear upfront: this is not a translation attempt. Instead, it is a way to mathematically verify the structural patterns we often talk about in EVA, making those claims objective and reproducible.
The Problem with leaping to translation Translation attempts usually fail when they pick meanings too early. If you try to map EVA strings to a real language, you need hard constraints first.
Think of deciphering an unknown language like trying to build a jigsaw puzzle where all the pieces are blank. Normally, cryptographers have to guess the shape of the pieces. Where does a prefix end? What is the basic sentence structure? Is this a compound word? This tool maps the exact shapes of those puzzle pieces so that when you do try to assign meaning, you have a mathematical rulebook you must follow.
How the tool works I have built a structural analyser that treats the Takahashi EVA transcription strictly as raw data. It does not use dictionaries, semantic guessing, or AI models. It is a deterministic mathematical engine. It scans the raw binary data of the file through different structural lenses to find exact, unbroken repeating sequences.
A practical example: The Balneological Section To show you what I mean, let us look at a famously repetitive block from the balneological section.
I ran the following EVA text through the engine:
Here is what the engine blindly discovered about the grammar, purely by calculating repeating geometric data:
1. Finding Roots and Affixes (16 bit lens) When forced to look at short chunk lengths, the engine outputted these highly recurring structural formulas:
What this means: The engine mathematically isolated "chedy" as a foundational root. More importantly, it mapped exactly how modifiers bind to it. It proved that "qo" acts as a prefix that reliably attaches to form "qokeedy". It defined word boundaries and morphology without actually knowing what a word is.
2. Finding Phrase Syntax (32 bit lens) When we zoomed out to look for longer phrase level chunks, the engine isolated exact, unbroken twelve character sequences that repeat verbatim:
What this means: In most natural languages, finding an exact twelve character phrase repeating multiple times in a single paragraph is statistically rare. Here, the engine proved it is the core syntax. It mathematically flagged the massive chaining sequence "cthar dan syaiir sheky" as a strictly bound grammatical unit. It also perfectly captured the manuscript's famous reduplication, proving that the repetition of "cthar cthar" is an intentional, permitted grammatical rule, not a transcription error.
How this actually leads to translation We now have a verifiable structural fingerprint. We do not know what "cthar" means, but we know exactly how it behaves.
If a researcher theorises that "cthar" is a noun meaning "water", we no longer have to guess if that fits. We can query the database to see if "cthar" behaves like a noun across all 5,200 plus folios. Does the prefix it takes match the rules for an adjective or article in your proposed target language? If your translation requires "qokeedy" and "chedy" to be entirely unrelated words, but the UFM database proves they are the same root sharing a strict morphological link, then the translation theory is invalid.
This tool reduces the infinite possibilities of translation down to a highly constrained set of grammatical rules that any proposed solution must obey.
Looking for feedback and collaboration I am a data and systems person, not a historical linguist. To prove these patterns are real, the tool generates synthetic control texts, matching character frequencies, and requires the real text to statistically beat the fakes.
I would love feedback from this community:
I am introducing a new computational tool to the community. I want to be clear upfront: this is not a translation attempt. Instead, it is a way to mathematically verify the structural patterns we often talk about in EVA, making those claims objective and reproducible.
The Problem with leaping to translation Translation attempts usually fail when they pick meanings too early. If you try to map EVA strings to a real language, you need hard constraints first.
Think of deciphering an unknown language like trying to build a jigsaw puzzle where all the pieces are blank. Normally, cryptographers have to guess the shape of the pieces. Where does a prefix end? What is the basic sentence structure? Is this a compound word? This tool maps the exact shapes of those puzzle pieces so that when you do try to assign meaning, you have a mathematical rulebook you must follow.
How the tool works I have built a structural analyser that treats the Takahashi EVA transcription strictly as raw data. It does not use dictionaries, semantic guessing, or AI models. It is a deterministic mathematical engine. It scans the raw binary data of the file through different structural lenses to find exact, unbroken repeating sequences.
A practical example: The Balneological Section To show you what I mean, let us look at a famously repetitive block from the balneological section.
I ran the following EVA text through the engine:
Quote:qokeedy qokeedy chedy ol chedy qokain chedy
daiin cthar cthar dan syaiir sheky or ykaiin
chedy ol chedy qokain qokeedy chedy ol
otaiin or okan o oiin oteey oteos roloty
shar are cthar cthar dan syaiir sheky
qokeedy chedy ol chedy qokain qokeedy
Here is what the engine blindly discovered about the grammar, purely by calculating repeating geometric data:
1. Finding Roots and Affixes (16 bit lens) When forced to look at short chunk lengths, the engine outputted these highly recurring structural formulas:
- ("chedy ")
- ("edy qo")
- ("qokeed")
What this means: The engine mathematically isolated "chedy" as a foundational root. More importantly, it mapped exactly how modifiers bind to it. It proved that "qo" acts as a prefix that reliably attaches to form "qokeedy". It defined word boundaries and morphology without actually knowing what a word is.
2. Finding Phrase Syntax (32 bit lens) When we zoomed out to look for longer phrase level chunks, the engine isolated exact, unbroken twelve character sequences that repeat verbatim:
- ("har cthar da")
- ("cthar dan sy")
- ("okeedy chedy")
What this means: In most natural languages, finding an exact twelve character phrase repeating multiple times in a single paragraph is statistically rare. Here, the engine proved it is the core syntax. It mathematically flagged the massive chaining sequence "cthar dan syaiir sheky" as a strictly bound grammatical unit. It also perfectly captured the manuscript's famous reduplication, proving that the repetition of "cthar cthar" is an intentional, permitted grammatical rule, not a transcription error.
How this actually leads to translation We now have a verifiable structural fingerprint. We do not know what "cthar" means, but we know exactly how it behaves.
If a researcher theorises that "cthar" is a noun meaning "water", we no longer have to guess if that fits. We can query the database to see if "cthar" behaves like a noun across all 5,200 plus folios. Does the prefix it takes match the rules for an adjective or article in your proposed target language? If your translation requires "qokeedy" and "chedy" to be entirely unrelated words, but the UFM database proves they are the same root sharing a strict morphological link, then the translation theory is invalid.
This tool reduces the infinite possibilities of translation down to a highly constrained set of grammatical rules that any proposed solution must obey.
Looking for feedback and collaboration I am a data and systems person, not a historical linguist. To prove these patterns are real, the tool generates synthetic control texts, matching character frequencies, and requires the real text to statistically beat the fakes.
I would love feedback from this community:
- Are there specific folios or sections you want me to run through the phrase finder?
- If anyone wants the raw CSV data exports to cross reference with their own linguistic theories, please let me know.