The Voynich Ninja - A deterministic tool for testing Voynich structural patterns (With balneological)

Pages: 1 2

Hi everyone, I'm excited to be here. Let me be upfront, I have used AI to help me write this post as I want to ensure that what I've done is clearly communicated and understood here. But please do not just assume it's AI slop, I've worked hard on this and I am happy to share the extensive, rigorous testing that's gone into getting to this point.

I am introducing a new computational tool to the community. I want to be clear upfront: this is not a translation attempt. Instead, it is a way to mathematically verify the structural patterns we often talk about in EVA, making those claims objective and reproducible.

The Problem with leaping to translation Translation attempts usually fail when they pick meanings too early. If you try to map EVA strings to a real language, you need hard constraints first.

Think of deciphering an unknown language like trying to build a jigsaw puzzle where all the pieces are blank. Normally, cryptographers have to guess the shape of the pieces. Where does a prefix end? What is the basic sentence structure? Is this a compound word? This tool maps the exact shapes of those puzzle pieces so that when you do try to assign meaning, you have a mathematical rulebook you must follow.

How the tool works I have built a structural analyser that treats the Takahashi EVA transcription strictly as raw data. It does not use dictionaries, semantic guessing, or AI models. It is a deterministic mathematical engine. It scans the raw binary data of the file through different structural lenses to find exact, unbroken repeating sequences.

A practical example: The Balneological Section To show you what I mean, let us look at a famously repetitive block from the balneological section.

I ran the following EVA text through the engine:

Quote:qokeedy qokeedy chedy ol chedy qokain chedy
daiin cthar cthar dan syaiir sheky or ykaiin
chedy ol chedy qokain qokeedy chedy ol
otaiin or okan o oiin oteey oteos roloty
shar are cthar cthar dan syaiir sheky
qokeedy chedy ol chedy qokain qokeedy

Here is what the engine blindly discovered about the grammar, purely by calculating repeating geometric data:

1. Finding Roots and Affixes (16 bit lens) When forced to look at short chunk lengths, the engine outputted these highly recurring structural formulas:

("chedy ")
("edy qo")
("qokeed")

What this means: The engine mathematically isolated "chedy" as a foundational root. More importantly, it mapped exactly how modifiers bind to it. It proved that "qo" acts as a prefix that reliably attaches to form "qokeedy". It defined word boundaries and morphology without actually knowing what a word is.

2. Finding Phrase Syntax (32 bit lens) When we zoomed out to look for longer phrase level chunks, the engine isolated exact, unbroken twelve character sequences that repeat verbatim:

("har cthar da")
("cthar dan sy")
("okeedy chedy")

What this means: In most natural languages, finding an exact twelve character phrase repeating multiple times in a single paragraph is statistically rare. Here, the engine proved it is the core syntax. It mathematically flagged the massive chaining sequence "cthar dan syaiir sheky" as a strictly bound grammatical unit. It also perfectly captured the manuscript's famous reduplication, proving that the repetition of "cthar cthar" is an intentional, permitted grammatical rule, not a transcription error.

How this actually leads to translation We now have a verifiable structural fingerprint. We do not know what "cthar" means, but we know exactly how it behaves.

If a researcher theorises that "cthar" is a noun meaning "water", we no longer have to guess if that fits. We can query the database to see if "cthar" behaves like a noun across all 5,200 plus folios. Does the prefix it takes match the rules for an adjective or article in your proposed target language? If your translation requires "qokeedy" and "chedy" to be entirely unrelated words, but the UFM database proves they are the same root sharing a strict morphological link, then the translation theory is invalid.

This tool reduces the infinite possibilities of translation down to a highly constrained set of grammatical rules that any proposed solution must obey.

Looking for feedback and collaboration I am a data and systems person, not a historical linguist. To prove these patterns are real, the tool generates synthetic control texts, matching character frequencies, and requires the real text to statistically beat the fakes.

I would love feedback from this community:

Are there specific folios or sections you want me to run through the phrase finder?
If anyone wants the raw CSV data exports to cross reference with their own linguistic theories, please let me know.

Thanks for reading. I hope this can be a useful, objective tool to help ground pattern claims in reproducible data.

There is zero agreement on the middle part of the manuscript but most people agree than the first half of the manuscript is probably like a typical Medieval Herbal guide. It is probably talking about what ailments the illustrated plant is good for and how to prepare the parts for treatments. My favorite plant page is You are not allowed to view links. Register or Login to view. because I think it looks like a waterlily.

I think You are not allowed to view links. Register or Login to view. it talking about the three stages that a human soul goes through during the ensoulment process so I am very interested in the text on that page. For years I have been trying to figure out the word for "soul".

(26-02-2026, 02:07 AM)pjburkshire Wrote: You are not allowed to view links. Register or Login to view.There is zero agreement on the middle part of the manuscript but most people agree than the first half of the manuscript is probably like a typical Medieval Herbal guide. It is probably talking about what ailments the illustrated plant is good for and how to prepare the parts for treatments. My favorite plant page is You are not allowed to view links. Register or Login to view. because I think it looks like a waterlily.

I think You are not allowed to view links. Register or Login to view. it talking about the three stages that a human soul goes through during the ensoulment process so I am very interested in the text on that page. For years I have been trying to figure out the word for "soul".

Because you are looking for a specific concept like soul, I ran the exact four lines of text from the top of You are not allowed to view links. Register or Login to view. through the engine to see what the mathematical blueprint of your page looks like.

I'm using the Takahashi transcription for my project. The EVA for You are not allowed to view links. Register or Login to view. is:

Quote:polarar okor opcheey yteey opchaly lshedy qofchdal lkodol opa korols
scsedy keedy cholkeeey otedor okor shedy chedy qokeedy oly shey qoyky
dchedy qokeedy oteedy chedaiin chey qokeedy olkedy ror oteedy okal
solkeey sor shecthy daiin okar

When forced to look at the 16 bit short word level, the engine immediately isolated a highly rigid family of structures dominating this specific paragraph. The mathematical root edy is the central anchor here. The engine mapped it rigidly binding to different prefixes to create highly controlled variations:

chedy
shedy
oteedy
qokeedy

The 24 bit lens also caught the strict chaining of these words, highlighting that sequences like hedy qoke and y qokeedy are mathematically bound grammatical units specific to this block of text.

If this page is describing three stages of a soul, we would expect the structural grammar to reflect a state change.

Hypothesis A: Does the word for soul act as a base root, perhaps something like edy or chedy, that gets modified by prefixes to represent the three different stages? For example, is stage one chedy, stage two shedy, and stage three oteedy?
Hypothesis B: Or is the word for soul a static, repeating block, and the surrounding structural formulas act as the verbs changing its state?

Because the engine has mapped the entire 5,200 plus folio dataset, we do not have to guess. If you have a candidate word on this page that you think means soul, let me know what it is.

I can query the database and we can look at its mathematical behaviour across the entire manuscript. We can check if it structurally behaves like a noun, what prefixes it is legally allowed to take, and if its structural fingerprint changes when moving from the herbal section to the balneological section.

I think page You are not allowed to view links. Register or Login to view. is also talking about ensoulment so I would expect the word for soul to also be on that page. I have often thought about trying to write something that could look for a word that was at least once on You are not allowed to view links. Register or Login to view. and at least once on You are not allowed to view links. Register or Login to view. but at low frequently on the plant pages but I have never bothered. Can your program do that?

(26-02-2026, 02:45 AM)TheEnglishKiwi Wrote: You are not allowed to view links. Register or Login to view.
If this page is describing three stages of a soul, we would expect the structural grammar to reflect a state change.

Hypothesis A: Does the word for soul act as a base root, perhaps something like edy or chedy, that gets modified by prefixes to represent the three different stages? For example, is stage one chedy, stage two shedy, and stage three oteedy?

Hypothesis B: Or is the word for soul a static, repeating block, and the surrounding structural formulas act as the verbs changing its state?

I think the Voynich Manuscript is talking about ensoulment in the tradition of Aristotle (successive souls: vegetative, animal, and finally rational). This is just a guess but I think the word for the soul itself would be the same and it would be like:

- the soul in the vegetative state or vegetative soul
- the soul in the animal state or animal soul
- the soul in the final rational state or rational soul

I would not expect them to have individual words for each of the stages so I think the answer would be Hypothesis B.

I know you say it's not AI slop...but it's looking and sounding an awful lot like AI slop. We have seen tons of slop papers that make a lot of claims about having proved things without making any sense.

I also feel I have seen another with a very similar hallucination to your "12 character sequences that repeat verbatim"... just where exactly do the first two repeat?

(26-02-2026, 02:57 AM)pjburkshire Wrote: You are not allowed to view links. Register or Login to view.I think page You are not allowed to view links. Register or Login to view. is also talking about ensoulment so I would expect the word for soul to also be on that page. I have often thought about trying to write something that could look for a word that was at least once on You are not allowed to view links. Register or Login to view. and at least once on You are not allowed to view links. Register or Login to view. but at low frequently on the plant pages but I have never bothered. Can your program do that?

Hi again. I just ran the script, and I reckon the results are pretty interesting!

I told the engine to find every structural root that exists on both You are not allowed to view links. Register or Login to view. and f82r, and then tally how many times those exact roots appear across the entire 57-folio Herbal section (f1r through f57v).

The engine found 109 shared structural formulas between your two pages. When we filter them by how rare they are in the Herbal section, a few massive outliers jump right to the top.

Here are the best candidates that match your exact profile for a highly specialized, non-plant topic word:

"lko"
- Matches: 2 times on f76v, 1 time on f82r.
- Herbal Frequency: Only 7 times across 57 entire folios.
- Context: On f76v, this appears inside the word lkodol. This is a massive anomaly. To have it cluster on your two specific pages but virtually vanish in the Herbal section strongly points to this being a specialized topic word (like a noun), not general grammar.
"lky"
- Matches: 2 times on f76v, 1 time on f82r.
- Herbal Frequency: Only 20 times.
"kor"
- Matches: 3 times on f76v, 1 time on f82r.
- Herbal Frequency: 84 times.
- Context: This appears multiple times in line 1 and 2 of You are not allowed to view links. Register or Login to view. (okor, korols).
- While it appears a bit more in the Herbal section, a density of 3 hits on a single short paragraph (f76v) compared to 84 hits across 57 entire pages makes this a highly targeted word.

How we know the filter worked: If we look at the very bottom of the data list the engine spit out, we see the general Voynich "glue" words.

For example, the root " dy" (as in chedy, shedy) appears 49 times on f76v, 11 times on f82r, and a staggering 2,295 times in the Herbal section. The root " ch" appears 2,777 times in the Herbal section.

This proves that chedy and qokeedy are general grammatical structures (like "and", "the", or common verb endings) used everywhere, whereas lko and kor are highly specialized topic words.

I'm okay with taking this to Private Messages if you want us to do that.

(26-02-2026, 03:18 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.I know you say it's not AI slop...but it's looking and sounding an awful lot like AI slop. We have seen tons of slop papers that make a lot of claims about having proved things without making any sense.

I also feel I have seen another with a very similar hallucination to your "12 character sequences that repeat verbatim"... just where exactly do the first two repeat?

I completely don't blame you for being skeptical. The best way for me to prove this isn't just slop is to answer your question directly and show you exactly where those sequences repeat in the raw text.

You asked where the first two 12-character sequences repeat verbatim:

The first two sequences the engine flagged on the 32-bit lens were "har cthar da" and "cthar dan sy".

Remember that the engine is blind to human language rules. It treats spaces simply as another character (byte) in the data stream, and it uses a sliding window to scan the text. It doesn't assume a space means "end of word."

Let's look at the exact block of Takahashi text I ran:

Line 1: qokeedy qokeedy chedy ol chedy qokain chedy
Line 2: daiin cthar cthar dan syaiir sheky or ykaiin
Line 3: chedy ol chedy qokain qokeedy chedy ol
Line 4: otaiin or okan o oiin oteey oteos roloty
Line 5: shar are cthar cthar dan syaiir sheky Line 6: qokeedy chedy ol chedy qokain qokeedy

Here is exactly where the first sequence [har cthar da] repeats verbatim:

Line 2: daiin ct[har cthar da]n syaiir sheky or ykaiin
Line 5: shar are ct[har cthar da]n syaiir sheky

If you count the keystrokes inside those brackets, including the spaces (h-a-r-[space]-c-t-h-a-r-[space]-d-a), it is exactly 12 characters. The string is mathematically identical on both lines.

Here is exactly where the second sequence [cthar dan sy] repeats:

Line 2: daiin cthar [cthar dan sy]aiir sheky or ykaiin
Line 5: shar are cthar [cthar dan sy]aiir sheky Again, c-t-h-a-r-[space]-d-a-n-[space]-s-y is exactly 12 characters, repeating verbatim.

Why does the engine output these overlapping chunks instead of clean words? Because it isn't guessing at semantics. By mathematically flagging these overlapping 12-character chunks, the algorithm is actually proving the existence of the larger, rigidly repeating macro-phrase: "cthar cthar dan syaiir".

It is pure string matching and spatial data mapping. There is no AI guessing involved, which is exactly why you can manually verify every single output just by using CTRL+F on the source file.

I'm putting together the final touches on a frontend UI so that I can share the link and you can use this yourself which I think will massively help!

(26-02-2026, 03:25 AM)pjburkshire Wrote: You are not allowed to view links. Register or Login to view.I'm okay with taking this to Private Messages if you want us to do that.

Yeah absolutely. For anyone else reading, I am currently putting a quick frontend UI together so you can interact with the engine directly and hopefully with more eyes/heads we can test this tool and collectively decide if it has any merit!

Cheers,

(26-02-2026, 03:21 AM)TheEnglishKiwi Wrote: You are not allowed to view links. Register or Login to view.
Here are the best candidates that match your exact profile for a highly specialized, non-plant topic word:

Thank you! I really appreciate this information!

Pages: 1 2