(24-04-2025, 10:20 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)
Sorry, but probably this is not plain enough for me, I'm still stumped
. Could you try explaining it is simpler terms, or maybe better let's try some Q&A?
1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?
2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.
Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
(24-04-2025, 10:48 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:41 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.There’s a 4-phase cycle repeating through the folios, like seasons.
We detect it using statistics, such as token entropy, topic modeling, and FFT (like finding beats in music), among others.
These cycles show up between folios and inside them, and they break when shuffled. So it’s not just the order — it’s baked into the content.
Thank you for the clarification! Where exactly in the output of the code that you provided it's possible to see these cycles?
The cycles become visible at several key points in the output:
- classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
- topic2_autocorr_detrended.png
This plot shows the autocorrelation of one topic across folios. The repeated wave-like pattern reveals a rhythmic structure — a clear sign of cyclic behavior.
- topic2_fft_detrended.png
This is the Fourier Transform of the topic activity. The spike in the frequency domain confirms a dominant, repeating cycle — again, not a random one.
- phenology_match_full.csv
Here we match the detected cycle to real-world plant blooming periods. We find alignment in over 65% of folios (±1 phase), which strengthens the interpretation as a calendar-like structure.
Also you can always create a simple code to see it in figures or charts!
(24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
How can randomly generated Lunar phases match the results from the classification is a complete mystery to me.
Looking at tfidf_lda_seed.py, it processes only tokens_per_folio.csv, so it's completely deterministic, not random (the value of the seed should not impact the result of the classification, I hope). It generates lda_topic_distributions.csv. Then supervised_models_seed.py outputs the model accuracy score (0.9649) when comparing the prediction of the model from lda_topic_distributions with the random phases from lunar_folio_dates.csv, right?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:20 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)
Sorry, but probably this is not plain enough for me, I'm still stumped
. Could you try explaining it is simpler terms, or maybe better let's try some Q&A?
1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?
2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.
Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
Also, I think you’re right to question whether setting four topics is arbitrary. It’s not. We tried multiple options and observed that four is the only number that yields a strong, consistent internal pattern, not just via topic modeling, but also in entropy trends, classifier performance (96.5%), autocorrelation, FFT peaks, and alignment with agronomic cycles.
So it’s not that we imposed four — instead, four emerged as the minimal number that produces a statistically significant and interpretable structure. Other values don’t reproduce the same coherence.
We can do a Q&A if you'd like.
P.S. I understand this is a forum and I’m new here, so maybe this is a normal tone. Still, I’d like to clarify that my intention is strictly academic and constructive. From now on, I’ll choose not to engage with any tone that feels dismissive or confrontational. There is no chunk or imposition. Thanks for understanding!
(24-04-2025, 11:32 AM)nablator Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
How can randomly generated Lunar phases match the results from the classification is a complete mystery to me. 
Looking at tfidf_lda_seed.py, it processes only tokens_per_folio.csv, so it's completely deterministic, not random (the value of the seed should not impact the result, I hope). It generates lda_topic_distributions.csv. Then supervised_models_seed.py outputs the model accuracy score (0.9649) when comparing the prediction of the model from lda_topic_distributions with the random phases from lunar_folio_dates.csv, right?
Yes, you got it mostly right.
You’re correct that tfidf_lda_seed.py processes tokens_per_folio.csv and outputs a fully deterministic topic distribution (lda_topic_distributions.csv). Then, supervised_models_seed.py uses that distribution, along with entropy and synthetic angles, to classify the randomly assigned “phases” from lunar_folio_dates.csv.
And here’s the key part: if the “phases” were truly random, the classifier shouldn’t be able to predict them with 96.5% accuracy. That’s where the mystery turns into a meaningful pattern. But still, I was worried about overworking, which is why we did the permutation test and the ablation test.
We confirmed this wasn’t a fluke using a 1,000-run permutation test, and the structure collapses when the angles are shuffled. So the match isn’t due to overfitting or circular logic. It shows that the original angle-phase assignment (seeded but pseudo-random) aligns unusually well with the actual latent structure in the Voynich tokens, as if the random seed accidentally “found” a harmonic.
The real breakthrough is realizing that the Voynich text, when processed this way, contains a cyclical rhythm that aligns with that specific structure — something we wouldn’t expect from a meaningless or random text.
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
For me personally this doesn't work as a good argument. Patterns appear by chance all the time. And even when they don't appear by chance, they can appear due to factors other than purpose. For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features. Another very simple example, if adjacent pages refer to one another by using words like "on the right page" and "on the left page", then recto and verso pages will have slightly different vocabulary. If this is consistently used across the whole MS, an ML model can easily pick this up.
Also, does your method account for the possibility that the folios of the manuscript are not presently in their original order?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
I've tried running the model again with seed 2025 (by running
perl -pi -e 's/1405/2025/g' *.py) and compared classification_predictions.csv. In both cases there are apparent cycles, but they don't appear to match (even if we remapped the phases), is this expected?
[attachment=10410]
(24-04-2025, 11:32 AM)nablator Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
How can randomly generated Lunar phases match the results from the classification is a complete mystery to me. 
Looking at tfidf_lda_seed.py, it processes only tokens_per_folio.csv, so it's completely deterministic, not random (the value of the seed should not impact the result of the classification, I hope). It generates lda_topic_distributions.csv. Then supervised_models_seed.py outputs the model accuracy score (0.9649) when comparing the prediction of the model from lda_topic_distributions with the random phases from lunar_folio_dates.csv, right?
[font=.AppleSystemUIFont]By the way, the lunar phases aren’t random every time — they’re generated with a fixed seed (1405) to ensure consistency. That’s why we can test if the structure aligns with them meaningfully. [/font]
(24-04-2025, 11:52 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
For me personally this doesn't work as a good argument. Patterns appear by chance all the time. And even when they don't appear by chance, they can appear due to factors other than purpose. For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features. Another very simple example, if adjacent pages refer to one another by using words like "on the right page" and "on the left page", then recto and verso pages will have slightly different vocabulary. If this is consistently used across the whole MS, an ML model can easily pick this up.
Also, does your method account for the possibility that the folios of the manuscript are not presently in their original order?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
I've tried running the model again with seed 2025 (by running perl -pi -e 's/1405/2025/g' *.py) and compared classification_predictions.csv. In both cases there are apparent cycles, but they don't appear to match (even if we remapped the phases), is this expected?
Yes — that’s precisely what we expect. Great that you tested it!
The seed controls the generation of synthetic lunar phases. Changing it from 1405 to 2025 scrambles the phase assignments across folios, so we’re no longer testing against the “true” structure we discovered.
So why do you still see a kind of cycle?
Because any clustering will always find some internal order — that’s just how unsupervised learning works. But the
real test is whether it consistently aligns with an external structure (in our case, the lunar phases we hypothesize). That only happens at seed 1405.
When you remap the seed, the model still tries to fit the data, but the alignment becomes coincidental or weak. We’ve tested this systematically through:
- Accuracy drops after permutation or shuffling
- Loss of autocorrelation peaks
- Loss of alignment with phenological data
- No coherence in FFT harmonic signals
The table you shared beautifully illustrates this: the 1405 seed yields coherent predictions; 2025 doesn’t.
(24-04-2025, 11:52 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
For me personally this doesn't work as a good argument. Patterns appear by chance all the time. And even when they don't appear by chance, they can appear due to factors other than purpose. For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features. Another very simple example, if adjacent pages refer to one another by using words like "on the right page" and "on the left page", then recto and verso pages will have slightly different vocabulary. If this is consistently used across the whole MS, an ML model can easily pick this up.
Also, does your method account for the possibility that the folios of the manuscript are not presently in their original order?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
I've tried running the model again with seed 2025 (by running perl -pi -e 's/1405/2025/g' *.py) and compared classification_predictions.csv. In both cases there are apparent cycles, but they don't appear to match (even if we remapped the phases), is this expected?
Thanks for your comment. You’re right that patterns can appear by chance or due to layout. That’s why we
explicitly tested for it.
We didn’t just find a pattern. We showed that only one configuration (seed = 1405) produces a coherent semantic structure with high accuracy, low autocorrelation, and alignment with independent data. All control runs (random seeds, permutations, folio reordering, no-angle tests, etc.) fail to reproduce it. So this isn’t about “patterns happen” — this is about patterns that don’t happen
under any other configuration.
We also checked within individual folios, to rule out recto/verso or binding effects. The structure persists locally.
So yes — skepticism is healthy. But science advances by testing, not guessing. We’ve tested. The result holds.
Happy to show plots or share code if helpful.
(24-04-2025, 11:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.The table you shared beautifully illustrates this: the 1405 seed yields coherent predictions; 2025 doesn’t.
If I understand it correctly, you tried various splits and various seeds and the 4-way split with seed 1405 showed the best results? I'm not in academia, so I'm not sure if this is methodologically sound. I mean, given only a limited number of folios there bound to be a combination where there will be a pattern.
(24-04-2025, 11:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Because any clustering will always find some internal order — that’s just how unsupervised learning works. But the real test is whether it consistently aligns with an external structure (in our case, the lunar phases we hypothesize). That only happens at seed 1405.
I understand a little better, thanks. 1405 is special.
I was confused because you wrote earlier:
"The seed 1405 is arbitrary and used solely to ensure full reproducibility. Any other fixed number would work — this one was chosen to match the year 1405, which aligns with the possible calendar framework used in the analysis"