Dear all,
In this new paper (You are not allowed to view links.
Register or
Login to view.), companion to the testable signatures paper we recently shared on this forum, we test whether Currier's idea of two distinct "languages" (A and B) in the Voynich manuscript holds up under modern statistical scrutiny. We do this in two complementary ways, using character-pair ratios (like how often 'd' appears versus 'l' on a given page):
- First, a generative (unsupervised) model looks at the raw character counts with no knowledge of Currier's labels and asks: how many distinct groups does the data itself support? It independently selects two groups and assigns pages in a way that substantially overlaps with Currier's A/B split.
- Second, and perhaps more importantly, a predictive model tests whether knowing a page's A/B label actually lets you forecast its character statistics on unseen pages. The result: it predicts held-out page labels at 89.2% accuracy in character-pair ratios on text the model has never seen.
The A/B distinction is not just a pattern Currier saw: a model rediscovers it blind, and it survives predictive cross-validation.
The A/B label is the dominant axis of variation, but it only explains about 29% of inter-page variance. There's a lot of structure left to account for!
The dataset and methodology are going to be disclosed in two steps: first to a closed group of specialists, then made generally available later on this year.
I hope you'll enjoy the reading