Most of my initial Voynich analysis focused on the most common tokens (I suppose it is the easiest way to start analysing the text). But this doesn't give us much information. So I started wondering if the opposite approach might actually be more informative.
Instead of looking at the global vocabulary, I analysed rare and semi-rare tokens. Not hapax legomena, since many of those could just be scribal noise or transliteration errors, but tokens appearing only a few times across the manuscript. My reasoning was simple: rarer tokens are potentially more specific, and therefore easier to trace between folios (and it might have less errors as they are repeated in the MS).
I built page-to-page networks based on shared rare tokens. The result was surprinsingly structured. Most pages remain weakly connected, but a few behave like hubs that link otherwise distant lexical communities.
The strongest case was f86v.
This is especially interesting because You are not allowed to view links.
Register or
Login to view. is a pure text folio. It has no obvious visual structure like zodiac diagrams or herbal labels. Yet it repeatedly emerges as one of the most connected pages in the manuscript when analysing rare-token overlap.
What caught my attention is that You are not allowed to view links.
Register or
Login to view. does not connect strongly to just one section. It shares selective vocabulary with multiple areas of the manuscript, especially Marginal stars, but also Biological, Herbal, Cosmological and others.
Another thing that stood out is the behaviour of the Marginal stars section itself. Several folios from this section repeatedly emerge as hubs or dense local connectors. This suggests that the section may contain a relatively coherent but highly reused layer of vocabulary, possibly acting as a bridge between otherwise more isolated parts of the manuscript.
Main rare-token hubs detected in the network:
| Folio |
Section |
Observed behaviour |
| f86v |
Text-only |
strongest transversal hub |
| fRos |
Cosmological |
cross-section connector |
| f111r |
Marginal stars |
dense lexical hub |
| f113r |
Marginal stars |
dense lexical hub |
| f115v |
Marginal stars |
highly connected |
| f108v |
Marginal stars |
bridge-like behavior |
| f72v |
Zodiac |
unexpectedly connected |
| f67r |
Astronomical |
distant lexical links |
| f89r |
Pharmaceutical |
cross-section overlap |
| f76v |
Biological |
emerges at relaxed thresholds |
The heatmap below shows one example. Each colored signal corresponds to links between You are not allowed to view links.
Register or
Login to view. and different manuscript sections using only semi-rare tokens. The x-axis follows token order inside the folio itself.
The important point is that the signals are sparse and selective. The page does look like a connector between different lexical neighborhoods (maybe references to text within the MS?).
The Rosettes foldout also behaves in a similarly unusual way. It does not dominate the network as strongly as f86v, but it repeatedly appears as a transversal connector between distant parts of the manuscript.
I am not claiming this proves that You are not allowed to view links.
Register or
Login to view. is a summary page or an index. That would be too speculative. But I do think these results support a weaker idea: some folios seem to reuse selective vocabulary drawn from multiple textual communities across the manuscript.
Maybe the most common tokens tell us how the text is generated, while the less common ones tell us how the manuscript is organized. I also think the existence of hubs is difficult to explain with a fully random distribution of rare tokens. If these tokens were placed randomly across the manuscript, I would not expect them to accumulate repeatedly around specific folios acting as lexical connectors between otherwise distant pages.
I think this opens a new way of checking for relationships between folios and sections.