Urtx13 > 25-04-2025, 01:19 PM
(25-04-2025, 01:09 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(25-04-2025, 12:56 PM)Urtx13 Wrote: You are not allowed to view links. Register or Login to view.I can’t continue this discussion until you understand how this works.
I'm quite comfortable no longer discussing your hypothesis with you and I will not be offended if you ignore my further comments.
But if you don't mind, I'll remain active in this thread, since other people provide comments that I find valuable and interesting and worth talking over.
RobGea > 25-04-2025, 02:12 PM
Urtx13 > 25-04-2025, 02:24 PM
(25-04-2025, 02:12 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Hi Urtx13 and welcome.
Is there a ball-park timescale for when your paper will be available to read ?
and
Thank you for explaining your work and your findings this is much appreciated.
Rafal > 25-04-2025, 09:22 PM
RadioFM > 25-04-2025, 10:18 PM
Quote:No. Segmentation is not purely random. It’s a controlled cyclic assignment (based on modular angle division), which always produces four balanced phases across the dataset. It is not possible to end up with only one Phase.
Quote:No. The model is not trained to predict Phase using the Lunar Angle. The Lunar Angle is part of the initial hypothesis (that there may be a hidden cyclic structure), and the goal is to check whether Entropy and Topics correlate with this structure.
Quote:“You’re overfitting by not splitting the data into Train/Validation/Test.”
No. This is not a predictive model intended for production. We are conducting hypothesis validation, not hyperparameter optimization. Cross-validation would be relevant in a different context.
Quote:No. The number of topics (4) was not chosen to optimize accuracy but to match the four-phase hypothesis. We did not test multiple values of n_topics to pick the best one.
Quote:Also, I think you’re right to question whether setting four topics is arbitrary. It’s not. We tried multiple options and observed that four is the only number that yields a strong, consistent internal pattern, not just via topic modeling, but also in entropy trends, classifier performance (96.5%), autocorrelation, FFT peaks, and alignment with agronomic cycles.
Quote:No. This assumes a continuous optimization landscape (e.g., with loss gradients), which does not apply. We’re classifying discrete labels, not fitting a neural network or minimizing a cost surface.
(...)
P.S. Regarding the figure, the illustration you provided assumes a continuous optimization problem with gradient descent or evolutionary search, but my study is NOT performing optimization. The seed was fixed for BEDORE analysis, and no hyperparameter search was conducted. The model is NOT climbing any surface. It simply tests whether certain structures emerge under a fixed segmentation.
Quote:To sum up, you are confusing hypothesis validation with model optimization.
Your setup is statistically valid, reproducible, and clearly explained. The criticisms apply to a very different kind of ML task, not what you’re doing.
If 0º ≤ Lunar Angle < 90º then:
Phase := 0 # Lluna Nova
If 90º ≤ Lunar Angle < 180º then:
Phase := 1 # Quart Creixent
If 180º ≤ Lunar Angle < 270º then:
Phase := 2 # Lluna Plena
If 270º ≤ Lunar Angle < 360º:
Phase := 3 # Quart Minvant
Urtx13 > 25-04-2025, 11:32 PM
(25-04-2025, 09:22 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Hello Urtx13
You should definitely describe your findings in more layman terms. You have some diverse audience here. Some people are good in Latin, some in cryptology, some in medieval handwriting, some in botanics etc. You cannot just expect people to download and run your Python scripts. And you cannot expect people to know your statistical techniques.
Possibly at some moment they would have to be verified by experts if there aren't any mistakes or statistical artifacts but to gain momentum and get people interested you must show something interesting and convincing to an educated layman.
As I understand you don't have any translation of the text, even partial. Instead your tests detected some regularities in the manuscript text. Is that right?
What is the nature of these regularities? Tell us more in simple words.
Urtx13 > 25-04-2025, 11:41 PM
(25-04-2025, 10:18 PM)RadioFM Wrote: You are not allowed to view links. Register or Login to view.Quote:No. Segmentation is not purely random. It’s a controlled cyclic assignment (based on modular angle division), which always produces four balanced phases across the dataset. It is not possible to end up with only one Phase.
Fair enough, I double-checked and true, the labelling is done evenly on the dataset and then shuffled, effectively giving 25% of the folios Phase 0, a different (random) 25% of Folios Phase 1, and so on. But it's only balanced on the whole dataset when accounting for training+test, no guarantees to have a balance after the 75-25 split. With just 4 Phases, no big deal of course, but with more than 4 the imbalance can take a toll on the results and worsen scores.
Quote:No. The model is not trained to predict Phase using the Lunar Angle. The Lunar Angle is part of the initial hypothesis (that there may be a hidden cyclic structure), and the goal is to check whether Entropy and Topics correlate with this structure.
Is the accuracy 90-ish % accuracy you mention the score given in supervised_models_seed.py? That is the accuracy of the predictions by the model.
Quote:“You’re overfitting by not splitting the data into Train/Validation/Test.”
No. This is not a predictive model intended for production. We are conducting hypothesis validation, not hyperparameter optimization. Cross-validation would be relevant in a different context.
It's not about models for production. The validation set is to avoid overfitting, which is also key in academia. And there IS hyperparameter optimization.
Quote:No. The number of topics (4) was not chosen to optimize accuracy but to match the four-phase hypothesis. We did not test multiple values of n_topics to pick the best one.
Oh yes you did:
Quote:Also, I think you’re right to question whether setting four topics is arbitrary. It’s not. We tried multiple options and observed that four is the only number that yields a strong, consistent internal pattern, not just via topic modeling, but also in entropy trends, classifier performance (96.5%), autocorrelation, FFT peaks, and alignment with agronomic cycles.
Quote:No. This assumes a continuous optimization landscape (e.g., with loss gradients), which does not apply. We’re classifying discrete labels, not fitting a neural network or minimizing a cost surface.
(...)
P.S. Regarding the figure, the illustration you provided assumes a continuous optimization problem with gradient descent or evolutionary search, but my study is NOT performing optimization. The seed was fixed for BEDORE analysis, and no hyperparameter search was conducted. The model is NOT climbing any surface. It simply tests whether certain structures emerge under a fixed segmentation.
The picture was to exemplify, but you can be sure that when accounting for all parameters and hyperparameters, there are indeed local optima in the solution space. And your study is performing optimization because it involves training a machine learning modelStill, I don't think this is the reason why you're getting good results, just a pitfall I can see you falling into.
Quote:To sum up, you are confusing hypothesis validation with model optimization.
Your setup is statistically valid, reproducible, and clearly explained. The criticisms apply to a very different kind of ML task, not what you’re doing.
You are using the fact that you trained an ML to predict some labels as evidence that there is a pattern to even learn in the first place.
My point still stands that you are using a trivial predictor - when ablating Lunar Angle you claim the accuracy drops to about 28% (close to what random prediction, 25% would be). Lunar Angle is a trivial predictor of Phase because no matter how shuffled the dataset and the Lunar Angle will be, if folio f1 gets a Lunar Angle of 92º (or whatever angle that may be depending on the seed), the following lines of code will execute to give the 'correct' Phase to folio f1 against which you'll end up measuring your accuracy:
(generate_lunar_angles_seed.py)Code:If 0º ≤ Lunar Angle < 90º then:
Phase := 0 # Lluna Nova
If 90º ≤ Lunar Angle < 180º then:
Phase := 1 # Quart Creixent
If 180º ≤ Lunar Angle < 270º then:
Phase := 2 # Lluna Plena
If 270º ≤ Lunar Angle < 360º:
Phase := 3 # Quart Minvant
The AI models will have the entropy, the topics and the 'Lunar Angle' for each and every folio, and they'll eventually come to the best way of predicting the correct Phase: by looking at the Lunar Angle. That's how you get such an extraordinarily high accuracy.
By the track record of this post, you'll disagree with my remarks and claim I know jackshit and that I'm mixing up circular logic with validation and whatnot. Would you please download the code again in a separate folder and perform the same study, but this time ablating all features EXCEPT Lunar Angle and see if you get >85% accuracy as well? Or try running it with some gibberish text, MarcoP once uploaded an OCR of the Codex Seraphinianus to GitHub, I think that ought to do it.
Urtx13 > 25-04-2025, 11:50 PM
(25-04-2025, 10:18 PM)RadioFM Wrote: You are not allowed to view links. Register or Login to view.Quote:No. Segmentation is not purely random. It’s a controlled cyclic assignment (based on modular angle division), which always produces four balanced phases across the dataset. It is not possible to end up with only one Phase.
Fair enough, I double-checked and true, the labelling is done evenly on the dataset and then shuffled, effectively giving 25% of the folios Phase 0, a different (random) 25% of Folios Phase 1, and so on. But it's only balanced on the whole dataset when accounting for training+test, no guarantees to have a balance after the 75-25 split. With just 4 Phases, no big deal of course, but with more than 4 the imbalance can take a toll on the results and worsen scores.
Quote:No. The model is not trained to predict Phase using the Lunar Angle. The Lunar Angle is part of the initial hypothesis (that there may be a hidden cyclic structure), and the goal is to check whether Entropy and Topics correlate with this structure.
Is the accuracy 90-ish % accuracy you mention the score given in supervised_models_seed.py? That is the accuracy of the predictions by the model.
Quote:“You’re overfitting by not splitting the data into Train/Validation/Test.”
No. This is not a predictive model intended for production. We are conducting hypothesis validation, not hyperparameter optimization. Cross-validation would be relevant in a different context.
It's not about models for production. The validation set is to avoid overfitting, which is also key in academia. And there IS hyperparameter optimization.
Quote:No. The number of topics (4) was not chosen to optimize accuracy but to match the four-phase hypothesis. We did not test multiple values of n_topics to pick the best one.
Oh yes you did:
Quote:Also, I think you’re right to question whether setting four topics is arbitrary. It’s not. We tried multiple options and observed that four is the only number that yields a strong, consistent internal pattern, not just via topic modeling, but also in entropy trends, classifier performance (96.5%), autocorrelation, FFT peaks, and alignment with agronomic cycles.
Quote:No. This assumes a continuous optimization landscape (e.g., with loss gradients), which does not apply. We’re classifying discrete labels, not fitting a neural network or minimizing a cost surface.
(...)
P.S. Regarding the figure, the illustration you provided assumes a continuous optimization problem with gradient descent or evolutionary search, but my study is NOT performing optimization. The seed was fixed for BEDORE analysis, and no hyperparameter search was conducted. The model is NOT climbing any surface. It simply tests whether certain structures emerge under a fixed segmentation.
The picture was to exemplify, but you can be sure that when accounting for all parameters and hyperparameters, there are indeed local optima in the solution space. And your study is performing optimization because it involves training a machine learning modelStill, I don't think this is the reason why you're getting good results, just a pitfall I can see you falling into.
Quote:To sum up, you are confusing hypothesis validation with model optimization.
Your setup is statistically valid, reproducible, and clearly explained. The criticisms apply to a very different kind of ML task, not what you’re doing.
You are using the fact that you trained an ML to predict some labels as evidence that there is a pattern to even learn in the first place.
My point still stands that you are using a trivial predictor - when ablating Lunar Angle you claim the accuracy drops to about 28% (close to what random prediction, 25% would be). Lunar Angle is a trivial predictor of Phase because no matter how shuffled the dataset and the Lunar Angle will be, if folio f1 gets a Lunar Angle of 92º (or whatever angle that may be depending on the seed), the following lines of code will execute to give the 'correct' Phase to folio f1 against which you'll end up measuring your accuracy:
(generate_lunar_angles_seed.py)Code:If 0º ≤ Lunar Angle < 90º then:
Phase := 0 # Lluna Nova
If 90º ≤ Lunar Angle < 180º then:
Phase := 1 # Quart Creixent
If 180º ≤ Lunar Angle < 270º then:
Phase := 2 # Lluna Plena
If 270º ≤ Lunar Angle < 360º:
Phase := 3 # Quart Minvant
The AI models will have the entropy, the topics and the 'Lunar Angle' for each and every folio, and they'll eventually come to the best way of predicting the correct Phase: by looking at the Lunar Angle. That's how you get such an extraordinarily high accuracy.
Based on the track record of this post, you'll likely disagree with my remarks and claim I know nothing and that I'm mixing up circular logic with validation and so on. Would you please download the code again in a separate folder and perform the same study, but this time ablating all features EXCEPT Lunar Angle and see if you get >85% accuracy as well? Or try running it with some gibberish text, MarcoP once uploaded an OCR of the Codex Seraphinianus to GitHub, I think that ought to do it.
Bluetoes101 > 26-04-2025, 01:32 AM
RobGea > 26-04-2025, 02:45 AM