[Article] Strong evidence of a structured four-phase system in the Voynich Manuscript

[Article] Strong evidence of a structured four-phase system in the Voynich Manuscript - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] Strong evidence of a structured four-phase system in the Voynich Manuscript (/thread-4650.html)

Pages: 1 2 3 4 5 6 7 8 9

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 01:44 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(24-04-2025, 01:00 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Do you know what a seed is?

A seed is just a number that lets you start the pseudo-random process from the same place every time. Without it, you’d get different results every run, like someone running 1 km, but always starting from a different city.

Yes, absolutely clear. I'm a software engineer. I've been doing this in my programs for decades, setting the random generator seed to be able to get the same result in a later run of course.

My problem is conceptual/philosophical: let me try to explain it.

If any calculation based on a sequence of random numbers produces something that compares amazingly well with reality this is a property of the sequence of random numbers, not a property of reality.

It's like discovering the entire US Declaration of Independence coded in the decimals of pi at offset 14287468794577131. Yes, it's there somewhere, but what does the discovery mean? Absolutely nothing.

Hi Nablator,

Apologies for my tone, I’m not a native speaker, and sometimes I express things more bluntly than I intend.

That said, I think there’s a fundamental difference between a random coincidence and a replicable structure. Your pi analogy is clever, but I’d argue it doesn’t quite apply here. It's like saying that if a monkey typing randomly on a keyboard writes a Shakespeare sonnet, then Shakespeare was just a random artifact of the keyboard.

Why? Because we’re not talking about a hidden message in an irrational number. We’re talking about a statistically significant, falsifiable structure that emerges in a linguistic corpus — and holds across different tests, models, and even transcriptions.

As a linguist, I can tell you: when such structure appears in language-like sequences, it’s not random noise. It suggests intentionality — whether functional, symbolic, or linguistic.

That’s not mysticism. I believe it is strong enough to be a potential first step toward understanding the system behind it.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 01:43 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.A random number generator (RNG) creates an arbitrary sequence of numbers.

When two people run an experiment that includes a random number generator, they will not get exactly the same results. When the process that is being experimented is well-behaved (not chaotic), the results will be similar but not the same.

If one wants to verify an experiment on a process, then one wants to make sure that two people get exactly the same results. This can be done in case both people set the same 'seed' (starting point) for the RNG. (Note that there are also RNG's for which this will not work).

So people here are invited to use the seed 1405 in order to get exactly the same result. Using other seeds can be interesting in order to check that it creates similar results.

Exactly. That’s precisely why I fixed the seed at 1405: not because it’s “magic,” but to ensure full reproducibility of the results.

This is also the approach I follow in the second paper, currently in preparation. There, I systematically alter the configuration: changing the seed, shuffling the input, removing variables, and switching transcription sources. The goal is to test the robustness and specificity of the pattern under different conditions.

One of the most telling results is that the same four-phase structure emerges even when using the Takahata transcription, which has fewer tokens and different segmentation rules. That reinforces the idea that we’re not dealing with a random artifact — the structure persists beyond the original data source.

So yes: setting a seed guarantees reproducibility, and breaking the configuration allows us to falsify the structure. The fact that it only holds under specific conditions is not a weakness. It’s the evidence.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 01:53 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(24-04-2025, 01:38 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Mathematical rigor? The analysis includes entropy metrics, LDA modeling, supervised classification, permutation tests, ablation studies, cross-checking...

Actually, the assortment of tools is one of the things that makes the whole pipeline hard to evaluate for me. From my point of view, mathematical rigor implies using the minimum number of tools and transformations to clearly validate your hypothesis. The fact that there was a need to stack a number of models and analytical tools on top of each other only makes it harder to identify potential problems.

I wonder if cyclical nature of the folios was detected in the past research on topic modeling in the Voynich Manuscript? E.g., You are not allowed to view links. Register or Login to view.

You first you say that mathematical rigor is required, and I agree. But then you criticize the pipeline for using too many validation methods?

That’s a contradiction.

This idea that “rigor means using the minimum number of tools” is, respectfully, a misunderstanding of what scientific validation is. Rigor is not minimalism. Rigor means testing your hypothesis from multiple independent angles to see if it holds. Thank God aerospace engineers don't think like you, they actively look for redundancies to ensure robustness!

I could’ve stopped after one successful test. But if I had, academic reviewers would rightly ask:

"Did you cross-validated it? Did you test robustness? Did you use alternative data sources?"

So I did all that. And now, ironically, I’m told that using too many validation layers makes it less credible?

That’s not how science works. A result that survives entropy analysis, LDA structure, supervised classification, permutation tests, ablation, and even transcription shifts is a lot more meaningful than one that depends on a single metric.

Also, to your question: previous LDA studies like the one you linked don’t propose or test for cyclical semantic structure. They don’t link topic distributions to external agronomic or lunar data, and that’s the core novelty of this approach.

Happy to discuss more when the whole paper is out.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - RadioFM - 24-04-2025

Just because seed no. 3,247,442 produces a paragraph of Shakespeare doesn't mean RandomShuffler.py has encoded Macbeth with that key LMAO

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 05:33 PM)RadioFM Wrote: You are not allowed to view links. Register or Login to view.Just because seed no. 3,247,442 produces a paragraph of Shakespeare doesn't mean RandomShuffler.py has encoded Macbeth with that key LMAO

It’s essential to clarify that what you’re suggesting is false. There’s a significant difference between finding an isolated, random fragment within a larger set that follows a pattern and identifying and testing that the entire set follows a pattern. With respect, it means you haven’t understood the process or how it works.

My process doesn’t rely on just a random seed. I have used validated statistical models and testing techniques to ensure that the whole pattern that appears is not random. What I’m doing is not about finding arbitrary matches; it’s about identifying statistical structures that can be replicated and that make sense in the context of the analyzed text.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - tavie - 24-04-2025

For the few of us who only just learned this meaning of "seed" today, could you please explain your findings in layman's terms? What are the characteristics of each of the four phases? Could you list a folio for each phase and explain why it falls under that category?

(PS: welcome to the forum!)

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 10:48 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.For the few of us who only just learned this meaning of "seed" today, could you please explain your findings in layman's terms? What are the characteristics of each of the four phases? Could you list a folio for each phase and explain why it falls under that category?

Of course! Let me try.

The seed is just a way to make sure that the code always starts from the same place. Without it, the code still runs correctly, but it would start each time from a different point, so that the results could vary slightly.

Imagine a runner: it’s not the same to run 1 km every day starting from a different city than to run 1 km on the same route every day. Using a seed is like always running from the same starting line — it allows us to compare results fairly and reproducibly.

What did we find?

We discovered that the botanical section of the Voynich Manuscript exhibits a consistent internal structure, dividing the folios into four repeating phases. These phases emerge when we analyze the statistical patterns of the text, things like vocabulary diversity (entropy), topic modeling, and other linguistic features. It partially aligns with lunar and agricultural cycles. That’s not proof of meaning, but it strongly suggests that the content was structured intentionally and follows a time-based logic.

This is important because the presence of a consistent internal structure suggests it was intentionally created. That means the Voynich Manuscript is not just random gibberish — it’s a system with internal logic.

The structure may reflect both the intended use of the manuscript and how its language works.
What we’ve found doesn’t decode the text yet, but it opens up an entirely new path toward understanding—and eventually translating it.

If you’re curious about which folios fall under each phase, the easiest way is to run the code — the file classification_predictions.csv includes the assigned phase for every folio, based on the model’s output.

I wish I could help you understand it better! Please, keep asking if you have doubts.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - RadioFM - 25-04-2025

EDIT: Your deterministic mapping from Lunar_Angle to Phase in generate_lunar_angles_seed.py makes the Lunar_Angle a trivial predictor to Phase - is Lunar_Angle a feature fed into your ML model?

Regardless of the previous EDIT, would you mind answering these questions?

The sweet spot is 4 phases then - any other number just doesn't yield such good results?
How's the performance with a single decision tree instead of random forest? Unless I'm doing something wrong, scores seem to be good
You're splitting the data 75% train vs 25% test on the shuffle/ablation/etc validation step w/ the same seed as you did back on training, correct?

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 25-04-2025

(25-04-2025, 03:46 AM)RadioFM Wrote: You are not allowed to view links. Register or Login to view.EDIT: Your deterministic mapping from Lunar_Angle to Phase in generate_lunar_angles_seed.py makes the Lunar_Angle a trivial predictor to Phase - is Lunar_Angle a feature fed into your ML model?

Regardless of the previous EDIT, would you mind answering these questions?

The sweet spot is 4 phases; any other number doesn't yield such good results?

How's the performance with a single decision tree instead of a random forest? Unless I'm doing something wrong, scores seem to be good.

You're splitting the data 75% for training and 25% for testing during the shuffle/ablation/etc. validation step, using the same seed as during training, correct?

Let's see:

“Lunar_Angle is a trivial predictor because it determines Phase?”

Yes — intentionally. Lunar Angle is a synthetic, deterministic mapping to a Phase. The goal is not to train a model to “guess” the phase. Instead, Phase acts as a fixed segmentation of the manuscript, consisting of 4 arbitrary but stable sections.

The actual goal is to test whether lexical entropy and topic distributions (from LDA) correlate with that segmentation. If they do, that suggests internal structure in the manuscript that aligns with this imposed 4-phase partition. That’s what we’re testing.

If Lunar_Angle didn’t predict Phase, something would be wrong with the code.

“Why 4 phases? Have you tried other numbers?”

Yes, and or entropy and topic signals collapse, or there’s no significant prediction above permutation baselines.

Only with four do we get a statistically considerable structure: validated via permutation tests, shuffling, ablation, FFT, and autocorrelation.

“How does a single decision tree perform?”

It performs well — almost too well — because it immediately splits on Lunar_Angle.
But again, the aim isn’t model optimization. It’s to test feature informativeness.
So we also test:

Model accuracy without Lunar_Angle
ablation of entropy and topics
Complete shuffle tests and random permutations

This lets us isolate which features carry a signal, and they do.

“You’re using the same random seed across all splits?”

Of course!
We want the same train-test split across all experiments (original, permutation, shuffle, and ablation) to ensure fair and controlled comparisons.

TL;DR:

Phase is a fixed reference segmentation, not a target of interest.
Lunar_Angle trivially predicts Phase — that’s by design.
The point is: does the text itself align with that structure?
Our tests say yes — consistently, robustly, and reproducibly.

In short, I think you’re confusing the purpose of the experiment.

This isn’t about predicting Phase as if it’s a hidden label. Phase is a synthetic, deterministic segmentation — one that is intentionally created.

The goal is to test whether features like entropy or topic distribution align with that segmentation.
If they do (and they do, even without Lunar Angle), it suggests a latent structure in the manuscript.

So the mapping is deterministic, on purpose.
We’re not modeling truth. We’re testing alignment.

RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - nablator - 25-04-2025

(24-04-2025, 11:50 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.The real breakthrough is realizing that the Voynich text, when processed this way, contains a cyclical rhythm that aligns with that specific structure — something we wouldn’t expect from a meaningless or random text.

Random text (in case of some kind on non-uniform randomness) still has to be produced somehow and the workflow, whatever it was, could have resulted in a cyclical rhythm, especially when the work was probably done bifolio by bifolio, 4 pages of each bifolio in the same Currier language and by the same scribe. If the "phase" matched the advancement of work on each bifolio we would get a sequence of 0, 1, 0, 1, ... in the first half of quires and 2, 3, 2, 3, ... in the second half.