Hello all,
Over the past few days, I’ve been experimenting with Non-negative Matrix Factorization (NMF) to detect topics across the Voynich manuscript, and to explore how these topics correlate with Currier languages A and B as well as the scribe hands.
Surprisingly (or perhaps not), the topics detected are clearly aligned with both the Currier classification and the identified scribal hands—even more strongly with the hands than with the language groups.
To determine the optimal number of topics, I evaluated a range of values using several metrics:
- Pseudo-perplexity: An approximation of how well the model predicts unseen data. Lower values generally indicate better topic quality.
- Topic coherence: Measures how semantically related the top words in each topic are. Higher coherence typically means more interpretable topics.
- Topic overlap (Jaccard similarity): Measures how much top words are shared across topics. Lower overlap is better—indicating more distinct topics.
- Number of unique high-weight words: Tracks how many distinct informative words are used across topics.
Based on this evaluation, 11 topics emerged as the most meaningful. Note that paragraphs containing only one or two words were excluded from the analysis to avoid noise.
Here's a heatmap showing how topic presence evolves across folios. Each row is a topic, each column is a folio, and the color intensity shows how strongly the topic is represented:
I then examined how well the detected topics aligned with:
- Currier languages A and B
- Identified scribal hands
The results are shown in the following scatter plots:
To quantify these correlations, I calculated the Chi-squared (χ²) p-value between the topic assignments and each categorical variable. A lower p-value (approaching 0) indicates stronger statistical association between the topic distribution and the given variable.
Here are the results:
p-value (topic vs language): 8.693479383813543e-33
p-value (topic vs hand): 2.8408917882824174e-119
As you can see, the p-value is much lower for the scribe hand, indicating that the detected topics are even more tightly linked to who wrote the paragraph than to the Currier language classification.
These findings suggest that topic modeling not only helps cluster content by lexical features, but also reflects deeper structural patterns of authorship and writing practices in the manuscript. It supports the idea that different scribes may have introduced or emphasized different "topics", even when writing in the same Currier language.
I'd be very interested to hear your interpretations or see comparisons with other modeling approaches.