What I wanted to know next was whether Voynich bursts behave like ordinary local repetition, or whether they are better understood as a system of local reuse with variation.
The Voynich still stands out immediately at the structural level. Nearly 69% of its tokens of length ≥ 3 fall inside bursts, compared with 27% in Culpepper, 16% in De docta ignorantia, and only 7% in the Alchemical Herbal text. Its bursts are also much larger, with a mean size of 15.6 tokens and a maximum of 162. In the control texts, bursts are much smaller.
But the most important result is not just that Voynich bursts are bigger. It is how they behave internally.
If you look at prediction inside bursts, the local neighborhood matters a lot, but the past does not beat the future. In other words, the current token is strongly constrained by nearby tokens, yet there is no clear left-to-right generation signal. The best predictor is not the previous context alone, but the unordered local pool. That pushes against a simple chain model where one token directly produces the next one.
The memory tests point in the same direction. Exact reuse in the Voynich is relatively low, but reuse by similarity is very high. Within the previous 10 tokens, exact reuse is only about 0.12, yet reuse through Levenshtein ≤ 2 rises to about 0.84, and reuse at the family level to about 0.64. So the system is not mainly repeating the same word. It is reusing the same local variant space.
That is where the contrast with normal texts becomes interesting. Culpepper and De docta ignorantia show more exact reuse than the Voynich, but they do not show the same combination of very large bursts, long paths, and extremely strong variant-based reuse. The Voynich seems less repetitive in the literal sense, but more repetitive in terms of families and near-neighbors.
Family clustering also helps. Once tokens are grouped into tight local families, the Voynich does not collapse into noise. Instead, it reveals a handful of large lexical centers such as chedy, daiin, qokeey, chol, and okal, each with many nearby variants. That suggests that a substantial part of the text may be generated not from isolated word forms, but from active local families that remain available for reuse over long spans.
So the emerging picture is this. Natural texts certainly have bursts, and they also have local similarity. But in those texts, much of the effect is tied to ordinary lexical repetition and familiar morphological clustering. In the Voynich, the effect is much more dominated by persistent local neighborhoods of similar forms. It looks less like ordinary repetition, and more like controlled movement inside a dense local variant pool.
| Corpus | Burst share (len ≥ 3) | Mean burst size | Max burst size |
| Voynich | 0.6877 | 15.60 | 162 |
| Culpepper English | 0.2707 | 6.41 | 73 |
| De docta ignorantia Latin | 0.1613 | 4.19 | 26 |
| Alchemical Herbal Latin | 0.0675 | 3.67 | 9 |
| Corpus | Mean path length | Long paths ≥ 5 | Long paths ≥ 8 |
| Voynich | 5.34 | 0.4426 | 0.2096 |
| Culpepper English | 4.05 | 0.2877 | 0.0877 |
| De docta ignorantia Latin | 3.00 | 0.1003 | 0.0104 |
| Alchemical Herbal Latin | 2.67 | 0.0280 | 0.0000 |
| Corpus | Exact reuse, last 10 | Reuse by Lev ≤ 1, last 10 | Reuse by Lev ≤ 2, last 10 | Family reuse, last 10 |
| Voynich | 0.1171 | 0.4313 | 0.8358 | 0.6433 |
| Culpepper English | 0.4222 | 0.4943 | 0.7970 | 0.7287 |
| De docta ignorantia Latin | 0.3408 | 0.4104 | 0.7234 | 0.5962 |
| Alchemical Herbal Latin | 0.1818 | 0.2375 | 0.6804 | 0.5249 |
| Corpus | Bag predictor exact | Past predictor exact | Future predictor exact | Interpretation |
| Voynich | 0.1998 | 0.1171 | 0.1171 | Strong local constraint, but no temporal directionality |
| Culpepper English | 0.5896 | 0.4222 | 0.4222 | Local pool matters more than ordered sequence |
| De docta ignorantia Latin | 0.5448 | 0.3408 | 0.3408 | Same pattern, weaker local density than Voynich |
| Alchemical Herbal Latin | 0.3226 | 0.1818 | 0.1818 | Small local pools, limited burst structure |
So after this second pass, I would say that the Voynich does not look like a text that simply repeats words, and it does not look like a straightforward left-to-right rewrite chain either. It looks more like a system that keeps a dense local pool of related forms active and keeps takining from it with small variations. That still does not tell us the exact mechanism. The core behavior seems to be local reuse with controlled variation, not ordinary repetition and not simple sequential derivation.