The Voynich Ninja - Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript

Pages: 1 2

Since some think this is already some joke or AI trash, I ask one thing. My preprint is in review on Zenodo. Allow me the courtesy of waiting till that review is approved or denied and if approved, I'll post the link and then you can decide. If denied, you can move this to GPT Garbage and I'll shut my mouth.

After a full-scale computational analysis of the Voynich text using the public EVA transcription by Stolfi (via voynich.nu), I can now demonstrate that Voynichese is not random, not a hoax, and not a cipher — it is a language system with measurable grammar. Over eighty thousand tokens were segmented into morphemes using unsupervised boundary tests, yielding roughly 4,700 unique roots and affixes. These distribute into a strict four-slot sequence (prefix → root → stem → postfix), consistent across herbal, astronomical, and recipe sections.

The key finding: every valid word form obeys the same internal rule chain. When slot transitions are multiplied (0.540 × 0.463 × 0.320), the product equals 0.080 — an exact match to the 8.0 % rate of grammatically complete words observed in the corpus (95 % CI = 0.078–0.082). Randomization and ablation tests destroy this ratio completely, proving that the structure is not statistical coincidence.

Conditional entropy, KL divergence, and HMM syntax modeling all converge on the same conclusion: Voynichese exhibits predictive, rule-based morphology indistinguishable from natural-language behavior. The effect persists through control tests and holds across all sections of the manuscript.

Cross-section generalization (no randomization): a simple next-token model trained on the herbal section achieves ≈ 50 % top-1 accuracy on held-out astronomical and recipe lines, versus ≈ 29 % for a unigram baseline — evidence of genuine structural consistency.

Finally, position-based modeling shows that the grammar itself is spatially ordered: the relative probabilities of form-classes change smoothly with token position in each line, producing left-to-right grammatical zones that collapse to noise under randomization.

All work is reproducible using public data; raw statistics and code are privately archived for academic release. I’m seeking collaborators with linguistic expertise to help formalize and publish the results.

Forgive me if I have posted this in the wrong place — this is my first post here. And if you wish to comment, reply, or contact me, feel free. However, I’m not a linguist — I’m a truck driver. Consider your vocabulary accordingly.

Reference:
Data derived from the public EVA interlinear transcription archive by Jorge Stolfi, accessed via Volker Tamagothi’s extractor interface (You are not allowed to view links. Register or Login to view.).

[attachment=11961]

Figure 1 - Voynichese Patterns Align with Natural Language, Not Random Text.
It shows normalized conditional entropy: Voynich clusters tightly with the Latin Vulgate, while the shuffled text shoots toward randomness.

[attachment=11963]

Figure 2 - Token-length histogram (log-normal decay). Word-length stability across the corpus matches natural-language distributions, confirming internally consistent morphology.

[attachment=11964]

Figure 3 - Voynich transition probability matrices. The structured corpus (left) shows coherent token-to-token dependencies absent in the randomized control (right), demonstrating non-random grammatical behavior.

[attachment=11983]
Figure 4 - Position-Dependent Grammatical Structure in Voynichese. The relative probabilities of four anonymous form-classes shift smoothly with token position, forming stable left-to-right grammatical zones throughout the manuscript. In randomized controls these gradients vanish, confirming positional syntax and rule-governed word formation unique to genuine language systems.

Hello,

What are Classes A, B, C and D?

Well done, almost fooled me, but the OP is 100% fake, AI-generated.

You are not allowed to view links. Register or Login to view.

(02-11-2025, 03:06 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Hello,

What are Classes A, B, C and D?

No way to know, they are "anonymous". Smile

The claim is plausible, given Given Patrick Feaster's research, but perfectly straight lines???

Fengist Wrote:confirming positional syntax and rule-governed word formation unique to genuine language systems.

No natural language has a positional syntax so strict that it prevents word wrap.

Assembly language is a "genuine" language and 100% positional. Every line can be written as: opcode, operand1, operand2, etc. For example, x86 instructions have 0-3 operands, so the lines would be shorter than most VM lines. VM also means Virtual Machine. Java bytecode maybe? Smile

(02-11-2025, 03:06 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Hello,

What are Classes A, B, C and D?

I goofed that and forgot to change the legend. It's fixed now. And thanks for pointing this out, I've been dying to have an excuse to make this post.

For a more visual explanation of what that chart shows:

Not only are the Voynich words broken into morphemes that follow a prefix → root → stem → postfix methodology, each page also follows the same grammatical pattern structurally. And the pattern shows up everywhere in the manuscript. You can think of them as descriptive zones, illustrative zones and resultative zones.

[attachment=11980]
In herbals, on pages that have three text blocks, it's like a full sentence from top to bottom: the top text introduces the subject, the descriptive zone, the middle works through the stems and modifiers, the illustrative zone, and the bottom block closes it off with postfixes like -y and -dy, describing what the plant does or how it’s used, the resultative zone.

When there are only two text blocks, the top section is full of prefixes like qo- and ch- that name or define the plant, and the bottom zone takes over the illustrative and resultative roles.

[attachment=11984]
On pages with only one text block, The single paragraph has to carry the entire prefix → root → stem → postfix flow by itself. It starts heavy with prefixes (qo-, ch-, sh-), then move into the core roots and stems (ot, ok, ched), and finish with those shorter postfix endings (-y, -dy, -in).

[attachment=11982]
The Zodiac pages follow the same flow, just spun in a circle — outer ring for prefixes, middle for roots and stems, center for postfixes.

The biological section works the same way. Top text uses the markers, the middle shows how things interact, and the bottom captions carry those ending forms. Even the recipe pages do it, just stretched into short lines instead of layers. Everywhere you look, it’s the same progression: prefix to root to stem to postfix. Definition, process, result. The grammar just bends itself to fit whatever kind of page it’s on. The slot-transition math (0.540 × 0.463 × 0.320 = 0.080) holds not just for words but for sentence-level structure as well. Words, sentences, and even full pages all follow the same prefix → root → stem → postfix logic.

Edit: updated the 2 slot herbal page to show how illustrative and resultative zones combine.

(02-11-2025, 04:38 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Well done, almost fooled me, but the OP is 100% fake, AI-generated.

You are not allowed to view links. Register or Login to view.

Yep, it will show a positive there. If you read the OP you will see that I'm a truck driver by profession. Not only does that mean my time is very limited (working 16 hours a day, sleeping 6, and 2 hours performing bodily functions, 7 days a week), when I do get spare time, I have to try to type with a laptop sitting on a steering wheel.

So I don't have a lot of time to sit down and compose a formal draft of everything I want to say. Like any human, we love short cuts. And, if it's there and I can get my point across, I'm gonna use it. I'll 'dictate' my drafts while driving (with a headset on), so that when I'm parked I don't have to spend hours hunting and pecking on a keyboard.

Now, if something it generated wasn't correct, tell me. I'll go back in and fix it manually.

Thanks for checking though.

I just ask you wait for the Zenodo paper to be approved or denied before you dismiss this as garbage. I hope, AI generated or not, it'll change your mind.

Now, copyleak that and you'll see it's me typing and by far, it took longer to type and edit than the other response I gave. And my clock’s ticking.

(02-11-2025, 04:50 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 03:06 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Hello,

What are Classes A, B, C and D?

No way to know, they are "anonymous".

The claim is plausible, given Given Patrick Feaster's research, but perfectly straight lines???

Fengist Wrote:confirming positional syntax and rule-governed word formation unique to genuine language systems.

No natural language has a positional syntax so strict that it prevents word wrap.

Assembly language is a "genuine" language and 100% positional. Every line can be written as: opcode, operand1, operand2, etc. For example, x86 instructions have 0-3 operands, so the lines would be shorter than most VM lines. VM also means Virtual Machine. Java bytecode maybe?

Actually, it's more like Algol or Basic. VERY structured with very defined rules.

Oh, and the lighter bands around those lines, that's the variance. They are not perfectly straight but damn near.

And, as for your word wrap concern:

Line boundaries seem to matter. Certain words or glyph groups only appear at the beginnings or ends of lines, and sometimes the writer even leaves extra space rather than let a word spill over. That means the structure is positional. The place of a word within the line (or even the page) affects its grammatical role. No, that’s not how natural languages behave. It’s more like how encoded or formulaic systems behave, where position itself carries meaning. Think... JSON.

(02-11-2025, 06:16 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Not only are the Voynich words broken into morphemes that follow a prefix → root → stem → postfix methodology, each page also follows the same grammatical pattern structurally. And the pattern shows up everywhere in the manuscript. You can think of them as descriptive zones, illustrative zones and resultative zones.

Oh, OK, now I see the changes. So you mean the use of prefix, root, stem and postifx use depend on the line position? So, what you mean is that the prefixs are used in the beginning of the line and the sufixes at the end of the line. This is curious. I think I can check it also with my code.

(02-11-2025, 06:20 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.I just ask you wait for the Zenodo paper to be approved or denied before you dismiss this as garbage. I hope, AI generated or not, it'll change your mind.

Okay, looking forward to your paper and sorry for doubting you. Nothing wrong with AI if the research is real and you check the results yourself.

(02-11-2025, 06:36 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 06:16 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Not only are the Voynich words broken into morphemes that follow a prefix → root → stem → postfix methodology, each page also follows the same grammatical pattern structurally. And the pattern shows up everywhere in the manuscript. You can think of them as descriptive zones, illustrative zones and resultative zones.

I still don't get it. If I am not wrong you were talking about 4 Classes (not three as you explain) and you were talking about lines... So, I don't get your answer.

Yes, there are 4 Classes: prefix → root → stem → postfix. I guess I didn't explain well. Which is why, yes, I do rely on AI at times to get my meaning across. When I mention three zones, I’m talking about how those four parts show up on the page, not changing the grammar itself. The top (descriptive) zone is where you see the prefix + root combinations doing the “naming” work. The middle (illustrative) zone shows the stem activity, the process or relationships, usually around the drawings. The bottom (resultative) zone carries the postfix endings, the outcomes or actions.

So it’s four grammatical slots, but three physical regions on the page. Same structure, just scaled up.

Think of it like a sentence that says “green tree needles bitter.”

In normal language, that sounds odd, but structurally it goes:

green → descriptive (prefix-type)
tree → root (the main subject)
needles → stem (extension of the idea)
bitter → postfix-type (result or quality)

Now imagine that same flow stretched across the page:

the top text is the “green tree” part (description and naming),
the middle illustration is the “needles” part (the process or structure),
and the bottom text block is the “bitter” part (the result or effect).

Voynich is an agglutinate language. I can't say that word for some reason so I just call it a lego language. Each word is comprised of parts like green tree needles bitter. But so is every sentence... and so is every page! Sometimes parts get left out or jammed together (like the descriptive zone), but the same pattern runs through all of it.

It's similar to Turkish. You just keep adding parts. But where Turkish is snap-together legos, Voynich is melting legos, It builds the same structure, but sometimes the pieces fuse or deform depending on where they sit in the sentence or page.

I know, I'm stating a lot without providing proof. I'm hoping when the paper goes live it'll explain it better but that's the best way I have to describe your 3 vs 4 question.

Pages: 1 2