The Voynich Ninja
[Article] Strong evidence of a structured four-phase system in the Voynich Manuscript - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] Strong evidence of a structured four-phase system in the Voynich Manuscript (/thread-4650.html)

Pages: 1 2 3 4 5 6 7 8


Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

Hi all,

Over the past months, I’ve been working on a detailed statistical analysis of the Voynich Manuscript. It suggests a consistent finding: the manuscript appears to follow a structured four-phase system, encoded across the botanical section and possibly beyond.

I used lexical entropy, topic modeling (LDA), and a symbolic “lunar” assignment system to classify each folio into one of four cyclical phases. The key is: this structure holds up across the entire manuscript, and even hints at internal cycles within folios.

Here’s what stands out:
- A classifier trained on entropy + topic distribution + synthetic lunar angle reached 96.5% accuracy
- A 1000-run permutation test confirmed this is far from random (p < 0.001)
- External validation with real botanical flowering data reached 65–68% phase alignment (relaxed cyclic margin)
- Harmonic patterns emerged through the FFT and autocorrelation of topic dynamics
- The pattern is not dependent on any linguistic decoding — it emerges structurally

I’ve submitted this to arXiv (submission ID: `submit/6380387`, under moderation). 
I’m sharing it here because this forum includes some of the best critical minds on the Voynich. I’d love to know what you think, especially:
- I think it is plausible that the manuscript encodes a symbolic or calendrical cycle, especially a botanical related to moon phases.
- Could this framework be linked to ritual, agronomy, or spiritual cycles?
- Have others attempted something similar and seen weak but non-random patterns?

Code and data: You are not allowed to view links. Register or Login to view. 

Everything is fully reproducible (seed = 1405). No cherry-picking. 
I’m not claiming to crack the VM — just showing that, structurally, it’s more organized than expected by chance.

Looking forward to your thoughts.

— O. Cho 
Universitat Oberta de Catalunya


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - oshfdk - 24-04-2025

Hi and welcome!

requirements.txt is missing from the repository

Could you give some high level plaintext description of what this is? I found your other posts that explains the idea: You are not allowed to view links. Register or Login to view.
But I'm not sure I understand it. Is the analysis based on word token distributions?

Also, how was this seed 1405 selected?


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - nablator - 24-04-2025

Welcome!

(24-04-2025, 09:25 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.requirements.txt is missing from the repository

It's in the other repository: You are not allowed to view links. Register or Login to view.


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 09:25 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Hi and welcome!

requirements.txt is missing from the repository

Could you give some high level plaintext description of what this is? I found your other posts that explains the idea: You are not allowed to view links. Register or Login to view.
But I'm not sure I understand it. Is the analysis based on word token distributions?

Also, how was this seed 1405 selected?

Hi, and thanks for your message!

I’ve just added the requirements.txt file to the repository, so the full pipeline is now reproducible and ready to run:
You are not allowed to view links. Register or Login to view.


What the project does (in plain terms)


This pipeline tests whether the Voynich Manuscript’s botanical section exhibits a statistically significant cyclical structure.
It does not assume the text is a phonetic language yet, but treats the glyph sequences as symbolic patterns, and applies:
  • Token distribution analysis
  • Lexical entropy per folio
  • Topic modeling (LDA)
  • Supervised classification of four synthetic “phases”
  • Permutation tests and ablations to verify the structure


The cyclical pattern is robust and reproducible — it emerges even when phase labels are shuffled or angle features are removed (and disappears when phase order is broken). This confirms that the structure is real, not random. Why is this happening? Who knows... 

The agronomic interpretation — linking these phases to sowing, growth, flowering, and harvest — is still a hypothesis.  However, the fact that it correlates with known plant phenology and historical agricultural calendars gives this hypothesis strong support, thanks to the statistical foundation.


Why seed 
1405
?



[font=.AppleSystemUIFont]The seed 1405 is arbitrary and used solely to ensure full reproducibility. Any other fixed number would work — this one was chosen to match the year 1405, which aligns with the possible calendar framework used in the analysis.[/font]

Let me know if you’d like any details or to try running the model. Happy to collaborate or improve it further!

Best,
O. Cho


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - nablator - 24-04-2025

I installed python 3.13.2 from Microsoft store (the only version proposed, I don't know if it's good for this project).

pip install -r requirements.txt

ended with:
..\meson.build:78:0: ERROR: Unknown compiler(s): [['ifort'], ['gfortran'], ['flang-new'], ['flang'], ['pgfortran'], ['g95']]

I guess I need a Fortran compiler.

---

The new requirement.txt (with different content and version numbers) fails quicker with a long stack trace:
pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.24.4 (from -r requirements.txt (line 1))
  Using cached numpy-1.24.4.tar.gz (10.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
ERROR: Exception:
Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1008.0_x64__qbz5n2kfra8p0\Lib\site-packages\pip\_internal\cli\base_command.py", line 106, in _run_wrapper
    status = _inner_run()
...
pip._vendor.pyproject_hooks._impl.BackendUnavailable: Cannot import 'setuptools.build_meta'

---

So I installed gfortran (in gcc-14.2.0-
64.exe from You are not allowed to view links. Register or Login to view. ), removed the version numbers from requirement.txt and now I get a different error:

      Run-time dependency pybind11 found: YES 2.12.1
      Run-time dependency scipy-openblas found: NO (tried pkgconfig)
      Run-time dependency openblas found: NO (tried pkgconfig and cmake)
      Run-time dependency openblas found: NO

      ..\scipy\meson.build:163:9: ERROR: Dependency lookup for OpenBLAS with method 'pkgconfig' failed: Pkg-config for machine host machine not found. Giving up.


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - oshfdk - 24-04-2025

(24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)

Sorry, but probably this is not plain enough for me, I'm still stumped  Smile . Could you try explaining it is simpler terms, or maybe better let's try some Q&A?

1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?

2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 09:52 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.I installed python 3.13.2 from Microsoft store (the only version proposed, I don't know if it's good for this project).

pip install -r requirements.txt

ended with:
..\meson.build:78:0: ERROR: Unknown compiler(s): [['ifort'], ['gfortran'], ['flang-new'], ['flang'], ['pgfortran'], ['g95']]

I guess I need a Fortran compiler.

---

The new requirement.txt (with different content and version numbers) fails quicker with a long stack trace:
pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy==1.24.4 (from -r requirements.txt (line 1))
  Using cached numpy-1.24.4.tar.gz (10.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
ERROR: Exception:
Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.13_3.13.1008.0_x64__qbz5n2kfra8p0\Lib\site-packages\pip\_internal\cli\base_command.py", line 106, in _run_wrapper
    status = _inner_run()
...
pip._vendor.pyproject_hooks._impl.BackendUnavailable: Cannot import 'setuptools.build_meta'

---

So I installed gfortran (in gcc-14.2.0-
64.exe from You are not allowed to view links. Register or Login to view. ), removed the version numbers from requirement.txt and now I get a different error:

      Run-time dependency pybind11 found: YES 2.12.1
      Run-time dependency scipy-openblas found: NO (tried pkgconfig)
      Run-time dependency openblas found: NO (tried pkgconfig and cmake)
      Run-time dependency openblas found: NO

      ..\scipy\meson.build:163:9: ERROR: Dependency lookup for OpenBLAS with method 'pkgconfig' failed: Pkg-config for machine host machine not found. Giving up.

Hi!

Wow, long report! 

It looks like the error stems from using Python 3.13, which is not yet fully supported by several shared libraries, such as NumPy. I’d recommend switching to Python 3.10 or 3.11, as they are both stable and have been tested with this pipeline.

The requirements.txt has been updated and works correctly under those versions.

Let me know if switching Python solves it — happy to help further if needed!


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - Urtx13 - 24-04-2025

(24-04-2025, 10:20 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)

Sorry, but probably this is not plain enough for me, I'm still stumped  Smile . Could you try explaining it is simpler terms, or maybe better let's try some Q&A?

1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?

2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.


Yes,

There’s a 4-phase cycle repeating through the folios, like seasons.
We detect it using statistics, such as token entropy, topic modeling, and FFT (like finding beats in music), among others.
These cycles show up between folios and inside them, and they break when shuffled. So it’s not just the order — it’s baked into the content.

Why four topics?
Good point. We imposed four topics, but the important part is:

-Once we do that, the pattern is consistent, predictable, and resistant to noise. So it works.
-Randomizing destroys it.

We don’t yet know why it’s there. But the discovery is that the pattern is there.

So we’re not saying “this is what it means” — we’re showing that there’s a real symbolic structure, not gibberish. This is the perfect first step toward understanding the language.  Proving that an internal pattern exists (with high accuracy) opens the door to finally solving the linguistic side—and reading the text within its historical context.

Happy to go deeper if you like!


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - nablator - 24-04-2025

(24-04-2025, 10:33 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.The requirements.txt has been updated and works correctly under those versions.

Let me know if switching Python solves it — happy to help further if needed!

Yes!

Reinstalled from scratch. No problem with version 3.11. Thanks!

python --version
Python 3.11.9

I had to:
- create folders data/raw, data/processed, data/results
- move ZL3-n.txt to data/raw/ZL3a-n.txt

python preprocess_eva_seed.py
--> folio_token_ids.csv, tokens_per_folio.csv

python generate_lunar_angles_seed.py
--> lunar_folio_dates.csv (folio, angle, phase)

python compute_entropy.py
--> entropy.csv

python tfidf_lda_seed.py
--> lda_topic_distributions.csv

python supervised_models_seed.py
--> full_features.csv, classification_predictions.csv


RE: Strong evidence of a structured four-phase system in the Voynich Manuscript - oshfdk - 24-04-2025

(24-04-2025, 10:41 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.There’s a 4-phase cycle repeating through the folios, like seasons.
We detect it using statistics, such as token entropy, topic modeling, and FFT (like finding beats in music), among others.
These cycles show up between folios and inside them, and they break when shuffled. So it’s not just the order — it’s baked into the content.

Thank you for the clarification! Where exactly in the output of the code that you provided it's possible to see these cycles?