17-10-2025, 05:32 PM
The interesting part is what happens inside each topic. The model’s boundaries are not absolute: some folios and even single paragraphs show mixed patterns, and the transition between A and B is gradual rather than abrupt.
This suggests we’re not seeing two distinct “languages”, but rather two registers or writing modes of the same underlying system (perhaps evolving over time, or reflecting different scribal habits or styles).
The UMAP projection below (by paragraph) shows this: each dot is a paragraph, positioned by overall word co-occurrence patterns.
[attachment=11725]
Even though the LDA model was forced to find only two main topics (that without knowing it correspond roughly to Currier A and B), the UMAP projection clearly shows three big (and even 8-9 smaller) distinct arms or regions.
This suggests internal variation inside both A and B: perhaps different scribal habits or styles, subject or topic sections, or chronological phases.
The Voynich script doesn’t behave as a clean binary system, but as a continuum with multiple internal clusters.
This suggests we’re not seeing two distinct “languages”, but rather two registers or writing modes of the same underlying system (perhaps evolving over time, or reflecting different scribal habits or styles).
The UMAP projection below (by paragraph) shows this: each dot is a paragraph, positioned by overall word co-occurrence patterns.
[attachment=11725]
Even though the LDA model was forced to find only two main topics (that without knowing it correspond roughly to Currier A and B), the UMAP projection clearly shows three big (and even 8-9 smaller) distinct arms or regions.
This suggests internal variation inside both A and B: perhaps different scribal habits or styles, subject or topic sections, or chronological phases.
The Voynich script doesn’t behave as a clean binary system, but as a continuum with multiple internal clusters.