To compare the Voynich Manuscript with ordinary texts and Voynich languages A and B, I built word–co-occurrence graphs (window of 5 tokens) for each corpus (≈20,000 tokens each, except Voynich A ≈11,000 as it does not have more). I have detected that the metrics depend too much from the graph size (nodes) so I built this comparison with similar length texts.
For every graph I computed key network metrics: clustering ©, path length (L), small-worldness (σ), degree inequality (Gini), eigenvector concentration, and resilience under targeted attacks. As said, all texts were normalized in size to make the metrics comparable.
Here are the results:
| Corpus | C | L | σ | Gini deg | Eigen conc. | Resilience ½ |
| LA In Psalmum Exposito | 0.63 | 2.48 | 4.40 | 0.55 | 0.0031 | 0.22 |
| FR La reine Margot | 0.69 | 2.26 | 3.03 | 0.62 | 0.0056 | 0.21 |
| VOY Marco's Markov | 0.65 | 2.46 | 3.19 | 0.62 | 0.0031 | 0.22 |
| EN Romeo and Juliet | 0.70 | 2.26 | 2.84 | 0.63 | 0.0053 | 0.23 |
| DE Simpl. Simpl. | 0.67 | 2.35 | 3.43 | 0.60 | 0.0039 | 0.21 |
| Voynich EVA | 0.66 | 2.60 | 3.95 | 0.58 | 0.0037 | 0.18 |
| Voynich EVA A | 0.65 | 2.56 | 4.18 | 0.53 | 0.0042 | 0.18 |
| Voynich EVA B | 0.67 | 2.58 | 3.60 | 0.59 | 0.0044 | 0.17 |
| T.Timm | 0.68 | 2.24 | 2.78 | 0.63 | 0.0069 | 0.24 |
| CA Tirant lo Blanch | 0.70 | 2.12 | 3.00 | 0.62 | 0.0085 | 0.22 |
| Voynich CUVA | 0.65 | 2.65 | 4.15 | 0.57 | 0.0033 | 0.18 |
| LA De docta ignorantia | 0.64 | 2.31 | 3.28 | 0.60 | 0.0063 | 0.26 |
| ES Lazarillo | 0.71 | 2.19 | 3.47 | 0.58 | 0.0079 | 0.19 |
| Naibe Cipher | 0.62 | 2.33 | 3.23 | 0.58 | 0.0040 | 0.29 |
| VI Vietnamese Stolfi | 0.67 | 2.16 | 2.20 | 0.62 | 0.0063 | 0.31 |
What we can see:
- All Voynich sections have higher σ (small-worldness) than ordinary texts, meaning stronger local clustering and weaker global connectivity (Note: in this calculation, the small-world coefficient σ was computed against degree-preserving random graphs, not simple Erdős–Rényi graphs. This means each random control graph keeps the same degree distribution as the original, making σ more reliable for linguistic networks. That's why the values are lower than in previous posts)
- Average path length L is slightly longer: information (or adjacency) travels less efficiently across the graph, suggesting that the network is more fragmented into local clusters with fewer bridging links connecting them.
- Degree Gini is lower: no strong “hub” tokens comparable to function words in real languages, indicating a more uniform connectivity pattern where all nodes share similar importance instead of a few dominating the network.
- Eigenvector concentration and resilience are both lower: influence and robustness are evenly spread rather than hierarchical, implying that the network lacks dominant central nodes and would fragment quickly if connections were removed.
I show here a couple of plots:
[
attachment=11854][
attachment=11853][
attachment=11852]
Natural languages form networks with a few highly connected function words and many peripheral ones.
The Voynich graphs are more uniform and modular: dense local clusters, few bridges, and a fragile large-scale structure.
As deduced throughout the thread, this pattern fits a system built from repetitive combinational rules rather than a natural linguistic process, with structural dependencies extending across at least five-token window rather than just immediate bigram links.