Urtx13 > 24-04-2025, 10:51 AM
(24-04-2025, 10:20 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)
Sorry, but probably this is not plain enough for me, I'm still stumped. Could you try explaining it is simpler terms, or maybe better let's try some Q&A?
1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?
2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.
Urtx13 > 24-04-2025, 10:54 AM
(24-04-2025, 10:48 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:41 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.There’s a 4-phase cycle repeating through the folios, like seasons.
We detect it using statistics, such as token entropy, topic modeling, and FFT (like finding beats in music), among others.
These cycles show up between folios and inside them, and they break when shuffled. So it’s not just the order — it’s baked into the content.
Thank you for the clarification! Where exactly in the output of the code that you provided it's possible to see these cycles?
nablator > 24-04-2025, 11:32 AM
(24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
Urtx13 > 24-04-2025, 11:43 AM
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:20 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 09:43 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.What the project does (in plain terms)
Sorry, but probably this is not plain enough for me, I'm still stumped. Could you try explaining it is simpler terms, or maybe better let's try some Q&A?
1) You mention cycles, but I don't understand how these cycles relate to the text. Is it topics cycling through folios? Do we have to assume the present ordering of folios for these cycles to make sense? Are there cycles within folios?
2) Basically, as far as I can see, the model splits the text into 4 topics (the number of topics is imposed upon the model) and then it is tested that the model can correctly identify the topic based on the tokens. I'm not sure what this proves exactly, I would assume if you take any text separated into chunks and ask a model to produce a split of chunks into 4 topics, the model will successfully identify some split based on token frequencies and then will successfully sort new chunks according to this split. I think I'm missing something here.
Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
Urtx13 > 24-04-2025, 11:50 AM
(24-04-2025, 11:32 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
How can randomly generated Lunar phases match the results from the classification is a complete mystery to me.
Looking at tfidf_lda_seed.py, it processes only tokens_per_folio.csv, so it's completely deterministic, not random (the value of the seed should not impact the result, I hope). It generates lda_topic_distributions.csv. Then supervised_models_seed.py outputs the model accuracy score (0.9649) when comparing the prediction of the model from lda_topic_distributions with the random phases from lunar_folio_dates.csv, right?
oshfdk > 24-04-2025, 11:52 AM
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
Urtx13 > 24-04-2025, 11:54 AM
(24-04-2025, 11:32 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
How can randomly generated Lunar phases match the results from the classification is a complete mystery to me.
Looking at tfidf_lda_seed.py, it processes only tokens_per_folio.csv, so it's completely deterministic, not random (the value of the seed should not impact the result of the classification, I hope). It generates lda_topic_distributions.csv. Then supervised_models_seed.py outputs the model accuracy score (0.9649) when comparing the prediction of the model from lda_topic_distributions with the random phases from lunar_folio_dates.csv, right?
(24-04-2025, 11:52 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
For me personally this doesn't work as a good argument. Patterns appear by chance all the time. And even when they don't appear by chance, they can appear due to factors other than purpose. For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features. Another very simple example, if adjacent pages refer to one another by using words like "on the right page" and "on the left page", then recto and verso pages will have slightly different vocabulary. If this is consistently used across the whole MS, an ML model can easily pick this up.
Also, does your method account for the possibility that the folios of the manuscript are not presently in their original order?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
I've tried running the model again with seed 2025 (by running perl -pi -e 's/1405/2025/g' *.py) and compared classification_predictions.csv. In both cases there are apparent cycles, but they don't appear to match (even if we remapped the phases), is this expected?
Urtx13 > 24-04-2025, 12:02 PM
(24-04-2025, 11:52 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Not to mention that patterns like this don’t appear naturally or by chance.
If such a structured cycle exists, it means someone intentionally designed it — which implies purpose, not randomness.
For me personally this doesn't work as a good argument. Patterns appear by chance all the time. And even when they don't appear by chance, they can appear due to factors other than purpose. For example, there can be differences in how a page was structured based on where in a quire it appears, which could produce cyclical features. Another very simple example, if adjacent pages refer to one another by using words like "on the right page" and "on the left page", then recto and verso pages will have slightly different vocabulary. If this is consistently used across the whole MS, an ML model can easily pick this up.
Also, does your method account for the possibility that the folios of the manuscript are not presently in their original order?
(24-04-2025, 10:51 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.classification_predictions.csv
The file shows that the model can accurately predict which of the four phases a folio belongs to, with an accuracy of over 96%. That would be impossible if there were no internal cycle to detect.
I've tried running the model again with seed 2025 (by running perl -pi -e 's/1405/2025/g' *.py) and compared classification_predictions.csv. In both cases there are apparent cycles, but they don't appear to match (even if we remapped the phases), is this expected?
oshfdk > 24-04-2025, 12:04 PM
(24-04-2025, 11:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.The table you shared beautifully illustrates this: the 1405 seed yields coherent predictions; 2025 doesn’t.
nablator > 24-04-2025, 12:08 PM
(24-04-2025, 11:54 AM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.Because any clustering will always find some internal order — that’s just how unsupervised learning works. But the real test is whether it consistently aligns with an external structure (in our case, the lunar phases we hypothesize). That only happens at seed 1405.