![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The structure of the Voynich text and how it may be generated - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: The structure of the Voynich text and how it may be generated (/thread-5500.html) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RE: The structure of the Voynich text and how it may be generated - quimqu - 07-04-2026 I tried to go one step further and check whether the local similarity structure actually contains any directional signal. The idea was simple. If the text is generated as a chain (A → B → C), we should be able to detect some asymmetry between past and future, or at least recover the correct order more often than chance. I tested this in several ways. Comparing past vs future context, using weighted similarity, checking persistence of changes, and trying to reconstruct local triples. The result is quite consistent across all tests. There is no detectable directionality in the local similarity signal.
Taken together, these results suggest that local similarity does not encode direction. Past and future contexts are essentially interchangeable from the point of view of form similarity. This does not mean the text was not written sequentially. It only means that direction cannot be recovered from local edit-distance structure. If anything, the behaviour looks more like selection within a dense space of compatible forms than a step-by-step transformation chain. RE: The structure of the Voynich text and how it may be generated - quimqu - 08-04-2026 I’ve been trying to build a very simple generative model that reproduces some of the statistical properties of the Voynich text. The idea is not to decode anything, just to see how far a local generative mechanism can go. The current version mixes a few ingredients. It copies and mutates recent tokens, occasionally samples from globally compatible “families” of words, and adds a small amount of controlled innovation. It also reuses the real line and paragraph structure, so positional effects are not invented but inherited from the manuscript. Below is a comparison between the Voynich (EVA) and the generated text after mapping from CUVA back to EVA:
At this point the model already reproduces quite well the global shape of the text. Entropy values are close, repetition rate is correct, and line-level constraints such as initial glyph bias and paragraph behavior are largely captured. In other words, a combination of local memory and positional constraints already explains a good part of the surface structure. Where it still fails is in the lexical field. The vocabulary is too small, the proportion of hapax is far too low, and the overall Levenshtein connectivity is weaker than in the Voynich. The generated text either reuses forms too much or creates new ones that are not well integrated into the dense network of similar words. The next step I’m working on is to push “controlled innovation” further. The goal is to generate more new word forms, but not arbitrarily. They need to remain close to existing families and land in regions of the space that are already dense. The Voynich seems to combine constant novelty with strong local similarity, and that balance is exactly what the current model still misses. I will keep you informed. RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026 I’m so interested in seeing how this progresses; congratulations. Adding a mild spacing ambiguity to your model as a final gloss is probably enough to boost the number of hapax legomena. For example, at the very end of generating a 38,000-token text, you could randomly omit 2-3% of all spaces, representing either scribal cramming or modern mis-transliteration. You could even add a line-position bias in that, where the probability of dropping a space increases from left to right, consistent with the observed left-to-right space narrowing in the VMS. RE: The structure of the Voynich text and how it may be generated - Rafal - 09-04-2026 Do I understand correctly that you are trying now to make some algorythm that would: - be executed by computer - would generate Voynichese like text - the text would have the similar statistical properties as Voynich Manuscript Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer ![]() And let's suppose you build something like that. Do you think it will be a proof of anything? RE: The structure of the Voynich text and how it may be generated - imre555 - 09-04-2026 hogyan tudok en bekapcsolodni a Voynich kezirat megfejtesebe ?? RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026 (Yesterday, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do I understand correctly that you are trying now to make some algorythm that would: Importantly, any algorithmic replication of Voynichese could simply mean that the algorithm approximates a manual, intuitive method. These kinds of generative models also stand to sharpen hypotheses for whether and how the VMS text can convey meaning. Once quimqu’s algorithm is fully described, it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes. RE: The structure of the Voynich text and how it may be generated - Rafal - 09-04-2026 Quote:it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes. I have thought about it a bit. My feeling is that a very verbose cipher (like your Naibbe) is almost indistinguishable from gibberish. If you use several options and several letters for a single letter of the plaintext and/or use a lot of nulls then your text can any statistical properties because you can arrange your cipher text in any way you like. You just have freedom with redundant letters. This way the signal may be lost in the noise unless you perfectly know where exactly to search for it. If Quimgu one day gives his algorythm then I could even try to sneak some meaning into it
RE: The structure of the Voynich text and how it may be generated - magnesium - 09-04-2026 (Yesterday, 02:51 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Quote:it will be interesting to see if there’s a way to sneak in a “payload” stream of plaintext letters or phonemes. There is freedom, but only so much if you're trying to use syntactically coherent language as an input and keep that language decipherable (without resorting to huge amounts of ambiguity in the encoding scheme). Voynichese places very tight constraints, IMO, on the most probable ways even a verbose cipher could be constructed, in terms of how much plaintext language can be transmitted on average by an individual token. RE: The structure of the Voynich text and how it may be generated - DG97EEB - 09-04-2026 (Yesterday, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do I understand correctly that you are trying now to make some algorythm that would: I agree this is the challenge... It doesn't prove anything.. I've created probably 70 generators of different kinds and they look like real Voynich and score well on Bowern-gaskill tests and others, I've built them with Cipher, without Cipher, with copy/mutate, without, and in the end, it tells you absolutely nothing about what the text says or how it was created.. RE: The structure of the Voynich text and how it may be generated - quimqu - 09-04-2026 (Yesterday, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Ideally it should be not too tedious. Remeber that the Voynich author didn't have computer Hi Rafal, I thought the same. It should not be so difficult if was created 600 years ago. But it is difficult to get a model that gives all the Voynich features at once. But I absolutelly agree that the human "art" or creativity is not well modeled by machines. A fact that lets me to a point: if it were a language, models can reproduce and analyse them quite well. If it is a human creation, it.might be easy for us humans to reproduce it and difficult to model by coding. (Yesterday, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Do you think it will be a proof of anything? No. And I am not pretending to proof anything either.... |