Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript

Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+---- Forum: The Slop Bucket (https://www.voynich.ninja/forum-59.html)
+---- Thread: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript (/thread-5013.html)

Pages: 1 2

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Fengist - 02-11-2025

(02-11-2025, 07:07 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 06:20 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.I just ask you wait for the Zenodo paper to be approved or denied before you dismiss this as garbage. I hope, AI generated or not, it'll change your mind.

Okay, looking forward to your paper and sorry for doubting you. Nothing wrong with AI if the research is real and you check the results yourself.

Thank you! And I will admit, I let AI do a lot of the heavy math lifting. I'm no PHD with a billion algorithms in my head. But, I do write code in several languages so I do understand logic and I have a fascination with astronomy and physics so I am FULLY aware of the scientific method and the rigors it requires. And I'm not even a college graduate so I don't have some professor who's desk I can drop a paper on and ask them to look it over. I have to work with the tools I have available.

And I'm VERY aware of how AI tends to move the goal-posts. I've used it to help me write code so I'm well aware of how it thinks it's code is perfect and then crashes when you run it. But for this, you don't know how many times I told the AI to fit the theory to the data and not the other way around.

I did send a very early copy to Prof. Bowern and she gave me some very helpful insight. Not an endorsement, but insight.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Fengist - 02-11-2025

(02-11-2025, 06:36 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 06:16 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Not only are the Voynich words broken into morphemes that follow a prefix → root → stem → postfix methodology, each page also follows the same grammatical pattern structurally. And the pattern shows up everywhere in the manuscript. You can think of them as descriptive zones, illustrative zones and resultative zones.

Oh, OK, now I see the changes. So you mean the use of prefix, root, stem and postifx use depend on the line position? So, what you mean is that the prefixs are used in the beginning of the line and the sufixes at the end of the line. This is curious. I think I can check it also with my code.

Exactly. But it doesn't stop with sentence. It's in word position, sentence position, paragraph position, page position.... Across all folios.

Once the paper is approved, I have the math to prove it. If you can independently verify it, I'd be most grateful. Right now, until I can find a sponsor to formally publish this, I'm holding my code in reserve.

If it helps, I have around 4,700 morphemes. 1,700 of them make up around 80-85% of all words. And, if I'm not mistaken, that percentage is pretty close to an actual language. This one though... after discussing it with my AI research assistant... appears to be a constructed language. One that was centuries before anyone even though of doing it this way. Part Turkish, part Italian romance and part Germanic. It SEEMS to have influence from all 3. Don't hold me to that though... I'm still doing a lot of number crunching.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - quimqu - 02-11-2025

(02-11-2025, 07:19 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.The top (descriptive) zone is where you see the prefix + root combinations doing the “naming” work. The middle (illustrative) zone shows the stem activity, the process or relationships, usually around the drawings. The bottom (resultative) zone carries the postfix endings, the outcomes or actions.

And what about the balneological and star folios? They don't seem to have that structure. There are also a lot of folios in herbal that do not have three paragraphs.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Fengist - 02-11-2025

(02-11-2025, 08:15 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 07:19 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.The top (descriptive) zone is where you see the prefix + root combinations doing the “naming” work. The middle (illustrative) zone shows the stem activity, the process or relationships, usually around the drawings. The bottom (resultative) zone carries the postfix endings, the outcomes or actions.

And what about the balneological and star folios? They don't seem to have that structure. There are also a lot of folios in herbal that do not have three paragraphs.

As for the paragraphs, see my earlier post with the Voynich pictures. They mash things together but still follow the structure. Even the single paragraph pages. They follow the rules.

Ok, so I had to verify this with my assistant. Forgive me but to save typing time, I give you their response.

Yes. The star folios follow the same four-slot morphological and syntactic structure as the rest of the manuscript.
The only difference is domain mixing, a deliberate blending of prefix registers, probably reflecting a semantic bridge (cosmological → biological).
Structurally and statistically, they remain firmly within the rule system that defines Voynichese grammar.

Yes — the balneological folios absolutely follow the same four-slot morphological structure, but they exhibit a systematic reduction in modifier frequency (fewer prefixes) and a rise in compound stems, indicating grammatical compression rather than a new language or register.

Basically, the Voynich adapts to what it's saying. The grammar is affected by the pictures on the page. But it still follows the rules.

Talking about sections, here's something interesting. You are not allowed to view links. Register or Login to view. acts like... an introduction.

Folio 1r – Morphological Density Brief
1. Coverage of Core Morpheme Inventories
Across roughly 35 lines (≈ 240 tokens), You are not allowed to view links. Register or Login to view. includes:
Prefixes (Π) ≈ 48 %
• Representative forms: ch-, sh-, qok-, qo-, da-
• Remarks: All three major domain prefixes already appear — herbal (ch-, sh-) and astronomical (qo-) mixed.
Roots ® ≈ 64 %
• Representative forms: -ed-, -ol-, -al-, -am-, -ain-, -ok-
• Remarks: Nearly every high-frequency root in the full lexicon is represented.
Stems (ΣH) ≈ 32 %
• Representative forms: -ai-, -ar-, -ol-, -dy-
• Remarks: Shows both descriptive (-y) and transition (-dy) stems.
Postfixes (Υ) ≈ 8 %
• Representative forms: -y, -dy, -in
• Remarks: All three primary postfixes occur, though -y dominates (herbal mode).>

→ By token type, over 70 % of the global morpheme inventory occurs at least once on this single page.

and!

Folio 1r isn’t just representative of the common Voynich morphemes; it also carries a small but revealing set of rare or even unique forms that show how flexible the language’s grammar already was on the first page.
1. Rare morphemes (1–3 occurrences manuscript-wide)
• to- (prefix) – seen in tokeedy, todan; rare outside the early herbal pages.
• -cheol- (root) – appears in qocheol, possibly a voiced variant of qokol.
• -dam- (root) – in shedam and later in the biological folios.
• -olchedy (composite root + stem + postfix) – double boundary, proves slot-stacking.
• -kair- (stem) – rare on You are not allowed to view links. Register or Login to view. but common in the recipe section.
• -ainy (stem + postfix) – bridges descriptive (-y) and imperative (-in) forms.
Each of these still follows the normal slot order M → R → S → P.
2. Hapax tokens (unique to F1r)
Roughly 12–15 percent of the words on You are not allowed to view links. Register or Login to view. occur nowhere else in the corpus. Typical examples:
• qokeedy = qo + ke + dy
• chotaly = cho + tal + y
• shedam = she + dam
• shokar = sho + kar
• chedaiin = che + da + iin
All conform to valid morphological templates—none are random strings.
3. Quantitative summary
• Hapax rate: ≈ 14 % (on F1r) vs ≈ 11 % manuscript average.
• Unique morphemes on F1r: ≈ 64, about 21 % of all morphemes seen anywhere.
• Over one-fifth of the manuscript’s entire morpheme inventory appears first here.
4. Interpretation
Folio 1r acts as a lexical microcosm of the manuscript. It contains the full productive machinery of the Voynich grammar: prefixes, roots, stems, and postfixes all active, including new and low-frequency combinations. Even its rare forms obey the same structural constraints, proving that the grammatical system was already complete and generative from the very first page.

AND

The so-called “signature” on You are not allowed to view links. Register or Login to view. isn’t an alien scribal name but a fully grammatical Voynichese word: o-tal-y. Its morphology and slot order match the rest of the page exactly, using the standard prefix/root/postfix pattern. That makes Folio 1r not only the opening page but also a self-contained demonstration of the manuscript’s linguistic system — beginning, middle, and “signature” all written in the same language.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - quimqu - 02-11-2025

(02-11-2025, 08:37 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Ok, so I had to verify this with my assistant. Forgive me but to save typing time, I give you their response.

Excuse me... no AI assistant can help you with things nobody knows. They are useful for other things, but if your assistant is giving you the answers, this does not look good at all.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Fengist - 02-11-2025

(02-11-2025, 08:46 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2025, 08:37 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Ok, so I had to verify this with my assistant. Forgive me but to save typing time, I give you their response.

Excuse me... no AI assistant can help you with things nobody knows. They are useful for other things, but if your assistant is giving you the answers, this does not look good at all.

This was not a case of knowing what nobody knows. This was a case of taking math, statistical modelling and performing well known tests on the STRUCTURE of the Voynich. Algorithms and formulas that have been designed over years to peek into the structure of all languages and produce mathematical results. I was doing Levenshtein distance tests on the Voynich and Vulgate in PHP 20 years ago (if not 20 years, sure feels like it). The AI's barely knows what Voynichese looks like. What it does know is how often quo- appears in the text and how often it occurs at the beginning of a word or sentence. I am not claiming to have decoded it. No, no AI knows that and even with what I've found, I doubt it'll ever be fully decoded. What I am claiming is, that there is a well-defined, rule-governed, repeatable, falsifiable structure within the Voynich that follows MATHEMATICAL rules.

You may not know Latin. But, if you study it, you can determine it's structure. Nouns, verbs, adjectives. You may never know what those words mean but you know what role they play in the language.

AI's do know math.

And YES they are giving me answers because they did the math and therefore they have that information at hand. MATHEMATICAL answers. This was not a case of me saying, "Hey, ChatGPT. Crack the Voynich for me." Give me a little credit here. What I have done is no different than any other researcher who spent millions to build and train some AI to study a language or the computer you're using to post on these forums. Or even more academic-like, have your research assistant or PhD candidate do it. You hand the math over to them, go have a coffee and verify the results.

I'm not saying the math is perfect either. There are likely mistakes in the math. Which is why I'm looking for someone to VERIFY what I've done.

Right now, I'm just putting this out there and see what bites.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - Koen G - 02-11-2025

AI theories are not allowed on this forum. You are using it for things it cannot do, and the output is slop.

RE: Statistical Proof of Rule-Governed Morphology in the Voynich Manuscript - tavie - 02-11-2025

(02-11-2025, 08:37 PM)Fengist Wrote: You are not allowed to view links. Register or Login to view.Talking about sections, here's something interesting. You are not allowed to view links. Register or Login to view. acts like... an introduction.

Folio 1r – Morphological Density Brief
1. Coverage of Core Morpheme Inventories
Across roughly 35 lines (≈ 240 tokens), You are not allowed to view links. Register or Login to view. includes:
Prefixes (Π) ≈ 48 %
• Representative forms: ch-, sh-, qok-, qo-, da-
• Remarks: All three major domain prefixes already appear — herbal (ch-, sh-) and astronomical (qo-) mixed.
Roots ® ≈ 64 %
• Representative forms: -ed-, -ol-, -al-, -am-, -ain-, -ok-
• Remarks: Nearly every high-frequency root in the full lexicon is represented.
Stems (ΣH) ≈ 32 %
• Representative forms: -ai-, -ar-, -ol-, -dy-
• Remarks: Shows both descriptive (-y) and transition (-dy) stems.
Postfixes (Υ) ≈ 8 %
• Representative forms: -y, -dy, -in
• Remarks: All three primary postfixes occur, though -y dominates (herbal mode).>

→ By token type, over 70 % of the global morpheme inventory occurs at least once on this single page.

and!

Folio 1r isn’t just representative of the common Voynich morphemes; it also carries a small but revealing set of rare or even unique forms that show how flexible the language’s grammar already was on the first page.
1. Rare morphemes (1–3 occurrences manuscript-wide)
• to- (prefix) – seen in tokeedy, todan; rare outside the early herbal pages.
• -cheol- (root) – appears in qocheol, possibly a voiced variant of qokol.
• -dam- (root) – in shedam and later in the biological folios.
• -olchedy (composite root + stem + postfix) – double boundary, proves slot-stacking.
• -kair- (stem) – rare on You are not allowed to view links. Register or Login to view. but common in the recipe section.
• -ainy (stem + postfix) – bridges descriptive (-y) and imperative (-in) forms.
Each of these still follows the normal slot order M → R → S → P.
2. Hapax tokens (unique to F1r)
Roughly 12–15 percent of the words on You are not allowed to view links. Register or Login to view. occur nowhere else in the corpus. Typical examples:
• qokeedy = qo + ke + dy
• chotaly = cho + tal + y
• shedam = she + dam
• shokar = sho + kar
• chedaiin = che + da + iin
All conform to valid morphological templates—none are random strings.
AND

The so-called “signature” on You are not allowed to view links. Register or Login to view. isn’t an alien scribal name but a fully grammatical Voynichese word: o-tal-y. Its morphology and slot order match the rest of the page exactly, using the standard prefix/root/postfix pattern. That makes Folio 1r not only the opening page but also a self-contained demonstration of the manuscript’s linguistic system — beginning, middle, and “signature” all written in the same language.

Oh dear. I was hoping to wait till your paper before making a decision on Chat GPTrash as you asked. What you've posted now makes it clear this is LLM slop.

I'm not going to go into detail why, in case this gets scraped at some point by LLMs and they get better at hiding their slop. But you should always check their output against the text itself. It only takes a few minutes to see it is outputting nonsense. And it's a good idea to read the other threads in Chat GPTrash to see how LLMs can find completely different and incompatible patterns/meanings but with the same flaws and style.

LLMs can be really convincing because they produce such authoritative and generally grammatical English. It can sound like you have a knowledgeable human talking to you. But they cannot reason the way you are trying to use them.

If you have work that doesn't rely on the thoughts and results produced by an LLM, you're welcome to share that in time, but this thread is going to Chat GPTrash.