The Voynich Ninja
Nordico Paradigm - A quantitative framework for Voynich analysis - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+---- Forum: The Slop Bucket (https://www.voynich.ninja/forum-59.html)
+---- Thread: Nordico Paradigm - A quantitative framework for Voynich analysis (/thread-5195.html)

Pages: 1 2 3


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - Jorge_Stolfi - 01-01-2026

(01-01-2026, 12:27 AM)aliben Wrote: You are not allowed to view links. Register or Login to view.this kind of precise feedback is exactly why I shared the work openly.

You would get more useful feedback if you posted an explanation of your method and results that could be understood by a general reader.  Say, by someone with a Ph. D. in computer science who spent more than 5 years analyzing the manuscript.

But something tells me that you don't understand them yourself, isn't that so?


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - asteckley - 01-01-2026

(01-01-2026, 12:41 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(01-01-2026, 12:27 AM)aliben Wrote: You are not allowed to view links. Register or Login to view.this kind of precise feedback is exactly why I shared the work openly.

You would get more useful feedback if you posted an explanation of your method and results that could be understood by a general reader.  Say, by someone with a Ph. D. in computer science who spent more than 5 years analyzing the manuscript.

But something tells me that you don't understand them yourself, isn't that so?
Don't be surprised if he answers you with "Thank you for this important question. You're absolutely right to ask if I understand the results..." LOL


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - bi3mw - 01-01-2026

What exactly do you calculate with “random forest” and why did you choose this method? I ask because, when faced with a similar question about line similarities, I decided against it and opted for “cosine similarity” instead. That seemed more effective to me. This indirectly raises the question of what “stylistic continuity” actually means.


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - aliben - 01-01-2026

You're absolutely right to demand clear understanding. Let me explain the core concepts in plain terms:

**What the Nordico Paradigm actually does:**
1. It measures **textual "style"** using two simple ratios:
  - Pₒ: How much 'o' and gallows characters appear
  - Rᵥc: Vowel-to-consonant ratio
 
2. These are normalized and combined into a **Continuum Index (CI)** that ranges from 0 to 1.

3. The CI shows **gradual variation** across the manuscript rather than sharp boundaries.

**Why this matters:**
If Currier's A/B were truly different "languages," we'd expect:
- Bimodal distribution of features (we don't see that)
- Clear boundaries between sections (we see gradients)
- Consistent A/B separation within physical units (we see mixing)

The data shows a **continuum**, which suggests different **stylistic registers** within one system, not two separate languages. thanks you

I used Random Forest for two main reasons:

1. **Feature importance analysis**: To determine which metrics (Pₒ, Rᵥc, CHOR, etc.) contribute most to distinguishing folios/quires. Random Forest gives clear feature importance scores.

2. **Cross-validation of quire prediction**: To test if folios can be correctly assigned to their quires based on stylistic features alone (72.3% accuracy with 5-fold CV).

**Why not cosine similarity?**
Cosine similarity measures **vector similarity** - great for comparing documents in high-dimensional space. I needed:
- **Interpretability**: Understanding WHICH features matter (Pₒ is 34.1% important, CHOR 12.8%, etc.)
- **Classification**: Predicting quire membership from features
- **Feature selection**: Identifying the most discriminative metrics

**Random Forest vs Cosine similarity:**
- RF: "What features best separate these groups?"
- Cosine: "How similar are these documents overall?"

For "stylistic continuity," I'm measuring **gradual change in measurable features** rather than discrete categories. The continuum is demonstrated by:
- Unimodal CI distribution (Hartigan's dip test: p>0.05)
- Continuous CHOR frequencies (0-82%, no bimodality)
- Mixed A/B folios within same quires

**What "stylistic continuity" means practically:**
It means the Voynich scribe(s) could adjust their writing "style" along a continuum based on:
- Content type (descriptive vs procedural)
- Section purpose (reference vs narrative)
- Physical context (quire organization)

This is more plausible than two separate linguistic systems that somehow get mixed within the same physical quire.


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - ReneZ - 01-01-2026

(01-01-2026, 12:01 AM)asteckley Wrote: You are not allowed to view links. Register or Login to view.Does it really matter?  Regardless of whether  an LLM was used or not, the paper and the work are complete horse crap. 

While I don't disagree, this is subjective, and can hardly be used as a basis to ban posts or people....


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - nablator - 01-01-2026

(01-01-2026, 01:21 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.While I don't disagree, this is subjective, and can hardly be used as a basis to ban posts or people....

100% AI detected on multiple posts => ban is objective and saves a lot of time.


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - aliben - 01-01-2026

Thank you for ensuring civil discussion continues. I understand skepticism toward new approaches, especially with the prevalence of AI-generated content.

The value I hope the Nordico Paradigm offers is methodological transparency and reproducibility. Whether the conclusions are right or wrong, the framework is:

1. **Fully reproducible** (all data + code open)
2. **Statistically validated** (multiple independent methods)
3. **Falsifiable** (clear metrics that can be proven wrong)

Even if the specific conclusions are incorrect, the approach of open, reproducible quantitative analysis might be useful for future research.

**To the broader question of methodology:**

The core insight isn't in any single metric, but in observing that:
- Textual features vary **continuously** rather than categorically
- This continuum **aligns with physical structure** (quires)
- Traditional A/B classification captures **endpoints** of this continuum

Whether this represents "stylistic registers," "different scribal practices," or something else is open to interpretation. The data simply show the continuous nature of the variation.

I'm here to learn from more experienced researchers about how to interpret these patterns.


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - aliben - 01-01-2026

In french
Je parle français 
J'utilise ia pour la traduction et la rapidité 

J'espère que vous voudrez bien comprendre la complexité 

Bien à vous 

Et bonne année


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - bi3mw - 01-01-2026

Sorry, but to me that sounds like a “jack of all trades.” I can't see the common thread. Perhaps you could explain in your own words what you calculate with “random forest,” i.e., what the underlying question is.

Quote:**Cross-validation of quire prediction**: To test if folios can be correctly assigned to their quires based on stylistic features alone (72.3% accuracy with 5-fold CV)

Isn't that a cross-document question?


RE: Nordico Paradigm - A quantitative framework for Voynich analysis - aliben - 01-01-2026

You're right to ask for clarification - let me explain the Random Forest application more specifically.

The underlying question for Random Forest:
"Can we predict which quire a folio belongs to based solely on its stylistic features (Pₒ, Rᵥc, CHOR, etc.)?"

What we calculate:
1. Features: 10 stylistic metrics for each folio (Pₒ, Rᵥc, CI, OTAL, CHOR, QOK, and suffix frequencies)
2. Target: The quire label (A-I) for each folio
3. Model: Random Forest classifier trained on 80% of data
4. Validation: 5-fold cross-validation to avoid overfitting

Results:
- 72.3% accuracy in predicting quire from style
- Feature importance: Pₒ (34.1%) most important, CHOR (12.8%), etc.
- This shows stylistic signatures are quire-specific

Why this matters for "stylistic continuity":
If style changed randomly or uniformly, we wouldn't get 72.3% quire prediction accuracy. The fact that we can predict quire from style with >70% accuracy shows:
1. Each quire has a distinct stylistic profile
2. These profiles form a **continuum** (not random clusters)
3. The variation is structured, not arbitrary

Is it a cross-document question?
Yes, exactly. We're asking: "Does document A (folio) belong to collection B (quire) based on its writing style?" It's document classification using stylistic features.