Again, a quite dense post. So, I summarize here first the findings, and then, if you are interested, you can deepen into the dense part of the post.
I have calculated normalized KPI for Voynich texts, natural language texts and generated texts. Across all metrics, the Voynich Manuscript behaves very differently from real languages. Its words tend to repeat in tight, predictable patterns instead of spreading naturally through the text. Unlike normal writing, where some words act as connectors or carry more weight, all Voynich words play almost the same role. The result feels organized and rule-based, but not like any language used for communication.
Now to the dense part
To compare the Voynich Manuscript with ordinary texts, I first built a graph for each text. Nodes represent tokens and edges represent co-occurrences. However, many graph metrics depend on graph size. Larger graphs naturally have more edges, higher degree, and different path lengths, so direct comparison would be misleading.
To fix this, I normalized all key performance indicators (KPIs). Each metric was divided by the value expected from a random or degree-preserving null model with the same number of nodes and links. This way, clustering coefficients, path lengths, modularity, and other values become comparable across texts of different sizes.
After normalization, the indicators reflect the internal structure of each text rather than its length. This allows a fair comparison between the Voynich graph and other texts, inependently from their length.
So, here are the plots, which I will try to update in the following days with new texts that other voynich ninja asked to analyze.
This plot compares modularity with h1, the average next-word unpredictability per token. While modularity shows how strongly the text divides into clusters of words that frequently co-occur, h1 captures how much variation each word allows in what follows it. Natural languages occupy the center, combining moderate modularity with a healthy range of continuations per word, structured yet flexible. Artificial and shuffled texts show lower modularity and different h1, indicating looser or more random connections. The Voynich variants stand out, especially EVA A and EVA, with very high modularity and moderate h1. This means each Voynich word tends to appear in fixed, repetitive contexts rather than freely combining with others. In linguistic terms, the text behaves like a system of rigid word sequences, highly organized but locally predictable, suggesting a rule-based construction rather than natural language syntax.
This plot compares modularity with H1, the global unigram entropy that reflects the diversity of the vocabulary and how evenly words are distributed. Modularity is the same as in the first plot, and we see cleary three outliers in terms oh H1 entropy: Timm's both texts (unshuffled and shullfed) have extreme H1 entropy, while surprisingly, Voynich EVA A has surprisingly low H1 entropy. This suggests that the Voynich A section is unusually repetitive at the token level, using a very restricted set of symbols or words compared to both natural and artificial texts. Its high modularity combined with such low entropy implies that its word co-occurrence network is extremely clustered, with strong internal repetition and limited cross-linking between word groups. In other words, Voynich A behaves like a tightly organized subsystem with recurrent local patterns rather than a flexible linguistic system.
This plot compares σ (small-world index), which measures how efficiently a network balances local clustering and global reach, with C₍rand₎, the average clustering expected from a random network preserving the same degree distribution. Natural languages form two groups, one around 0.22 C₍rand₎ and the other around 2.8. Artificial or shuffled texts mostly stay near that range (specially Timm's text) or drop lower, showing disrupted structure. The Voynich variants, however, specially EVA A and EVA (total) reach the highest σ values (>4) while maintaining the lowest C₍rand₎, indicating extreme small-world organization: highly clustered local patterns far stronger than any random expectation. (Note that Latin In Psalmum Exposito has very simmilar values as Voynich). This reinforces the idea that the Voynich text is internally coherent and self-reinforcing, producing a network that is both tightly knit and unusually segregated, unlike any natural or mechanically generated language in the comparison.
This plot compares the small-world index (σ) with L₍rand₎, the average path length expected from a random degree-preserving network. In linguistic terms, L₍rand₎ represents how easily any two words would connect by chance, while σ measures how much more efficiently the real text achieves both clustering and connectivity. Natural languages cluster around σ ≈ 3 and L₍rand₎ ≈ 2.5, showing a stable balance between local cohesion and global reach. Artificial texts like Markov or Timm’s simulations vary but remain below that stability line. The Voynich variants, specially EVA A and both full Voynich texts (EVA and CUVA), again occupy the upper-right corner (again together with Latin In Psalmum Expositio), showing the highest σ values even at similar path lengths. This means the Voynich word network connects globally as efficiently as natural language, yet maintains far stronger local clustering. The result reinforces a consistent pattern: Voynich behaves like a hyper-small-world system, structurally optimized and internally repetitive, distinct from both human language and random models.
This plot compares the resilience fraction to half, which measures how much of the network must be removed before its main component loses half of its nodes, with the Gini degree, which reflects inequality in how connections are distributed among nodes. The Voynich variants stand at the bottom of the resilience, their networks are more uniform but collapse faster when nodes are removed. This means the Voynich lacks dominant hubs yet depends heavily on many small (i.e. daii, chol, etc), tightly interlinked groups. In other words, its structure is homogeneous but fragile, reinforcing the picture of a highly ordered system with repetitive local patterns rather than a robust, hierarchically organized linguistic network.
This plot compares eigenvector concentration, which measures how much network influence is dominated by a few central nodes, with the Gini degree, which shows how unevenly connections are distributed. Spanish and Catalan languages occupy the upper area, with relatively high eigenvector concentration, meaning a few highly connected words (like articles or prepositions) anchor the text’s structure. Artificial and shuffled texts are more dispersed, reflecting weaker central hubs. The Voynich variants (and Latin In Psalmum Expositio) cluster at the bottom, with low eigenvector concentration, indicating that influence is spread evenly and no nodes dominate. This uniformity suggests that the Voynich word network lacks linguistic hierarchy: for example, there are no functional equivalents to “the” or “and.” Instead, its connectivity is flat and homogeneous, consistent with a self-similar system where all tokens play structurally similar roles rather than forming grammatical or semantic hierarchies typical of real language.
From a linguistic perspective, the Voynich Manuscript behaves unlike any known language. Its words form tightly bound groups that repeat with remarkable regularity, showing internal consistency but little grammatical flexibility. Unlike natural languages, which rely on a few common function words to link diverse terms, the Voynich text distributes its words evenly, with no clear hierarchy or connectors. Its structure is highly organized and self-contained: words co-occur in predictable clusters, yet the text maintains coherence across the whole manuscript. This gives the impression of a system driven by formal rules rather than meaning a closed combinatorial code where repetition and pattern replace syntax and semantics. In fact, it is what our human intuition tells us when we see the words repeating through the MS (qokeedy, daiin, etc).
As always, any thoughts are welcome.