The structure of the Voynich text and how it may be generated

The structure of the Voynich text and how it may be generated - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The structure of the Voynich text and how it may be generated (/thread-5500.html)

Pages: 1 2 3 4 5 6

RE: The structure of the Voynich text and how it may be generated - quimqu - 07-04-2026

I tried to go one step further and check whether the local similarity structure actually contains any directional signal. The idea was simple. If the text is generated as a chain (A → B → C), we should be able to detect some asymmetry between past and future, or at least recover the correct order more often than chance.

I tested this in several ways. Comparing past vs future context, using weighted similarity, checking persistence of changes, and trying to reconstruct local triples.

The result is quite consistent across all tests. There is no detectable directionality in the local similarity signal.

Test	What it measures	Result	Conclusion
Past vs future (best match)	Whether tokens are closer to previous or next context	~31% past vs ~31% future	No directional bias
Weighted directional score	Same as above, but weighting by distance	~49.5% vs ~50.5%	Almost perfect symmetry
Chain reconstruction (A→B→C)	Whether real order scores better than permutations	31% real vs 26% shuffle	Very weak signal, near random
Persistence of changes	Whether edits from A→B persist in C	0.462 vs 0.459 (shuffle)	No real accumulation

Taken together, these results suggest that local similarity does not encode direction. Past and future contexts are essentially interchangeable from the point of view of form similarity.

This does not mean the text was not written sequentially. It only means that direction cannot be recovered from local edit-distance structure.

If anything, the behaviour looks more like selection within a dense space of compatible forms than a step-by-step transformation chain.

RE: The structure of the Voynich text and how it may be generated - quimqu - 08-04-2026

I’ve been trying to build a very simple generative model that reproduces some of the statistical properties of the Voynich text. The idea is not to decode anything, just to see how far a local generative mechanism can go.

The current version mixes a few ingredients. It copies and mutates recent tokens, occasionally samples from globally compatible “families” of words, and adds a small amount of controlled innovation. It also reuses the real line and paragraph structure, so positional effects are not invented but inherited from the manuscript.

Below is a comparison between the Voynich (EVA) and the generated text after mapping from CUVA back to EVA:

Feature	Voynich	Generated
tokens	38262	38111
types	8743	6882
type_token_ratio	0.2285	0.1806
hapax_share_types	0.7195	0.2448
mean_word_len	5.0898	5.7704
repeat_rate	0.0077	0.0079
H1_char	3.8892	3.8852
H2_char	3.1339	3.2351
H3_char	2.7986	2.9418
H_conditional	2.3786	2.5850
line_initial_len_delta	0.3876	0.6570
line_start_js	0.2813	0.2443
line_end_js	0.4557	0.4120
line_initial_gallows	0.5447	0.7137
paragraph_initial_gallows	0.9176	0.8541
global_density_lev2	0.0700	0.0364
local_density_lev2	0.1046	0.0546
gap_local_global	0.0346	0.0182

At this point the model already reproduces quite well the global shape of the text. Entropy values are close, repetition rate is correct, and line-level constraints such as initial glyph bias and paragraph behavior are largely captured. In other words, a combination of local memory and positional constraints already explains a good part of the surface structure.

Where it still fails is in the lexical field. The vocabulary is too small, the proportion of hapax is far too low, and the overall Levenshtein connectivity is weaker than in the Voynich. The generated text either reuses forms too much or creates new ones that are not well integrated into the dense network of similar words.

The next step I’m working on is to push “controlled innovation” further. The goal is to generate more new word forms, but not arbitrarily. They need to remain close to existing families and land in regions of the space that are already dense. The Voynich seems to combine constant novelty with strong local similarity, and that balance is exactly what the current model still misses.

I will keep you informed.

RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026

I’m so interested in seeing how this progresses; congratulations. Adding a mild spacing ambiguity to your model as a final gloss is probably enough to boost the number of hapax legomena. For example, at the very end of generating a 38,000-token text, you could randomly omit 2-3% of all spaces, representing either scribal cramming or modern mis-transliteration. You could even add a line-position bias in that, where the probability of dropping a space increases from left to right, consistent with the observed left-to-right space narrowing in the VMS.

RE: The structure of the Voynich text and how it may be generated - Rafal - 09-04-2026

Do I understand correctly that you are trying now to make some algorythm that would:

- be executed by computer
- would generate Voynichese like text
- the text would have the similar statistical properties as Voynich Manuscript

Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer Wink

And let's suppose you build something like that. Do you think it will be a proof of anything?

RE: The structure of the Voynich text and how it may be generated - imre555 - 09-04-2026

hogyan tudok en bekapcsolodni a Voynich kezirat megfejtesebe ??

RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026

(09-04-2026, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do I understand correctly that you are trying now to make some algorythm that would:

- be executed by computer
- would generate Voynichese like text
- the text would have the similar statistical properties as Voynich Manuscript

Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer

And let's suppose you build something like that. Do you think it will be a proof of anything?

Importantly, any algorithmic replication of Voynichese could simply mean that the algorithm approximates a manual, intuitive method. These kinds of generative models also stand to sharpen hypotheses for whether and how the VMS text can convey meaning. Once quimqu’s algorithm is fully described, it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes.

RE: The structure of the Voynich text and how it may be generated - Rafal - 09-04-2026

Quote:it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes.

I have thought about it a bit.
My feeling is that a very verbose cipher (like your Naibbe) is almost indistinguishable from gibberish.
If you use several options and several letters for a single letter of the plaintext and/or use a lot of nulls then your text can any statistical properties because you can arrange your cipher text in any way you like. You just have freedom with redundant letters.

This way the signal may be lost in the noise unless you perfectly know where exactly to search for it.

If Quimgu one day gives his algorythm then I could even try to sneak some meaning into it Wink

RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026

(09-04-2026, 02:51 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.
Quote:it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes.

I have thought about it a bit.
My feeling is that a very verbose cipher (like your Naibbe) is almost indistinguishable from gibberish.
If you use several options and several letters for a single letter of the plaintext and/or use a lot of nulls then your text can any statistical properties because you can arrange your cipher text in any way you like. You just have freedom with redundant letters.

This way the signal may be lost in the noise unless you perfectly know where exactly to search for it.

If Quimgu one day gives his algorythm then I could even try to sneak some meaning into it

There is freedom, but only so much if you're trying to use syntactically coherent language as an input and keep that language decipherable (without resorting to huge amounts of ambiguity in the encoding scheme). Voynichese places very tight constraints, IMO, on the most probable ways even a verbose cipher could be constructed, in terms of how much plaintext language can be transmitted on average by an individual token.

RE: The structure of the Voynich text and how it may be generated - DG97EEB - 09-04-2026

(09-04-2026, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do I understand correctly that you are trying now to make some algorythm that would:

- be executed by computer
- would generate Voynichese like text
- the text would have the similar statistical properties as Voynich Manuscript

Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer

And let's suppose you build something like that. Do you think it will be a proof of anything?

I agree this is the challenge... It doesn't prove anything.. I've created probably 70 generators of different kinds and they look like real Voynich and score well on Bowern-gaskill tests and others, I've built them with Cipher, without Cipher, with copy/mutate, without, and in the end, it tells you absolutely nothing about what the text says or how it was created..

RE: The structure of the Voynich text and how it may be generated - quimqu - 09-04-2026

(09-04-2026, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer

Hi Rafal,

I thought the same. It should not be so difficult if was created 600 years ago. But it is difficult to get a model that gives all the Voynich features at once. But I absolutelly agree that the human "art" or creativity is not well modeled by machines. A fact that lets me to a point: if it were a language, models can reproduce and analyse them quite well. If it is a human creation, it.might be easy for us humans to reproduce it and difficult to model by coding.

(09-04-2026, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do you think it will be a proof of anything?

No. And I am not pretending to proof anything either....