![]() |
||||||||||||||||||
|
The structure of the Voynich text and how it may be generated - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: The structure of the Voynich text and how it may be generated (/thread-5500.html) |
||||||||||||||||||
RE: The structure of the Voynich text and how it may be generated - ReneZ - 10-04-2026 (09-04-2026, 12:16 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Remeber that the Voynich author didn't have computer This is the difficult part. A human does not follow a strict pattern. The computer can be told not to follow a strict pattern too, but the result will never be even remotely similar to the human output. They will always diverge quickly. In that situation it is quite complicated to find indicators (extracted from the two texts) that show whether one is on the right path or not. RE: The structure of the Voynich text and how it may be generated - Rafal - 10-04-2026 Quote:But I absolutelly agree that the human "art" or creativity is not well modeled by machines. Creativity is a form of human thinking. And actually we still don't know and understand how humans think, what is the exact process that produces some result. We developed AI that can mimic final results of some human creativity processes but it is almost sure that it reaches these results in a totally different way that humans do. Take chess as an example. Computers when playing chess analyze millions of positions using minimax algorythm ( You are not allowed to view links. Register or Login to view. ). Do people playing chess do the same? Rather not, certainly not consciously. They often rely on this blurry thing called intuition Yet the final result is similar to a machine playing chess.I believe that if Voynich text doesn't have meaning and was generated in some way then it was generated in some semi-systematic way. Not fully systematic and not fully random, something in-between. The scribe had a set of rules and common tricks but had a freedom if he will choose Rule1, Rule2 or Rule3 at some moment and at other moments was just improvising. Recreating it with an algorythm may be hard and even if you get something similar you can never be sure if it wasn't done another way. But of course I am really curious to see your results!
RE: The structure of the Voynich text and how it may be generated - quimqu - 10-04-2026 I post an example of text output: sokeey chear qot aleedy chokeey cheor cheal alched chochor rchedar otoar cheol sorshey lcheedy alchey alched olo otoar dytchy chdol chochor ototar olkeedy oldaiin eeety ykeor lchedy dodl qod polchedy teeoar chody dod chedain daiinokshody cheedy teeoar chckheed alshedy oinoly qoka tchoar cheeog ykaly oeeody okchey khey olaiin pcheocphy dosg tchoar oteey oaiin pcheocphy tolkeeedy qoteeod dosg loiin opalkar polaiin polkeey ctheor sheekain chedar chokey tolkeeedy pchomotor kalol okyytaiin olshdy shdalo qokain dyoty ychocphy shoeey kydain lchey shdaly lshechy qokcho ockhhy keshar okal ykeey ytodaly shoshy skaiiodar scho araiiin chol kydain chlol lteey arom oraiiin olaiiin cthar chopchy kedar ykeey cheedy shtor pchody yty qototeeey qokchedy sorchy chckhy otair dkeey chcthy qokeey dsheedal qody pcheety qofchedy oto osary chedy ckhey pchomotor qokshedy okey chtaldy cphol cheol qotes dcheedy qedy kan ctheety oteeg qotas qokshedy ear yteod ctho toar okoraldy kar dar qotes tochady chtaly qody qokeody daral shokol okchaldy chotal chdam lkarshar okoraldy qokeody okary dalam dokedy sokeey chear qot aleedy chokeey cheor cheal alched chochor rchedar otoar cheol sorshey lcheedy alchey alched olo otoar dytchy chdol chochor ototar olkeedy oldaiin eeety ykeor lchedy dodl qod polchedy teeoar chody dod chedain daiinokshody cheedy teeoar chckheed alshedy oinoly qoka tchoar cheeog ykaly oeeody okchey khey olaiin pcheocphy dosg tchoar oteey oaiin pcheocphy tolkeeedy qoteeod dosg loiin opalkar polaiin polkeey ctheor sheekain chedar chokey tolkeeedy pchomotor kalol okyytaiin olshdy shdalo qokain dyoty ychocphy shoeey kydain lchey shdaly lshechy qokcho ockhhy keshar okal ykeey ytodaly shoshy skaiiodar scho araiiin chol kydain chlol lteey arom oraiiin olaiiin cthar chopchy kedar ykeey cheedy shtor pchody yty qototeeey qokchedy sorchy chckhy otair dkeey chcthy qokeey dsheedal qody pcheety qofchedy oto osary chedy ckhey pchomotor qokshedy okey chtaldy cphol cheol qotes dcheedy qedy kan ctheety oteeg qotas qokshedy ear yteod ctho toar okoraldy kar dar qotes tochady chtaly qody qokeody daral shokol okchaldy chotal chdam lkarshar okoraldy qokeody okary dalam dokedy Note that unicode character at start of line 7 is one of the special gliphs from EVA transliteration. The model reproduces many surface-level statistical and positional properties of the Voynich text, but fails to capture its core lexical structure: a highly productive system that generates many new word forms while keeping them densely interconnected. The words are not drawn from a fixed bag of words. The model starts from an attested vocabulary, but new tokens are continuously created through small mutations of existing ones. These candidates are then selected based on local context and compatibility with similar forms, and finally projected back onto the known vocabulary space. This creates a system where words are partly reused, but also systematically varied, rather than simply sampled independently from a predefined list. Text is generated token by token using a hybrid mechanism. At each step, the model either copies and mutates a recent token, samples from a compatible “family” of similar forms, or introduces controlled innovations that are snapped back to attested vocabulary. Generation is constrained by a real line skeleton, so line length, position in line, and paragraph-initial context directly influence which forms can appear. Additional biases enforce known properties such as longer words and higher gallows frequency at line and paragraph starts. This setup reproduces well the external structure of the text: line lengths, repetition rate, basic entropy, and most positional effects (line start/end distributions, paragraph-initial patterns). However, it fails to reproduce the internal lexical geometry. Vocabulary size is too small, hapax rate is far too low, and both global and local Levenshtein connectivity are weaker than in Voynich. The model captures how the text looks, but not how it generates many new yet tightly related word forms. For example, word daiin appears only as part of two longer words. RE: The structure of the Voynich text and how it may be generated - Jorge_Stolfi - 11-04-2026 (10-04-2026, 06:53 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The model reproduces many surface-level statistical and positional properties of the Voynich text. Quimqu, I see that you are taking seriously the theory by Thorsten & Timm that the VMS text was generated by some complicated random process with some kind of feedback ("copy and mutate"). Even if you don't accept my claim about the Starred Parags being a transcription of the Shennong Bencaojing (see the Chinese theory thread), I must warn you that this approach is fundamentally flawed, for at least these reasons:
The only way to show that the VMS is "meaningless" would be to find a deterministic algorithm that was much shorter than the VMS and generated exactly the same text (not just a text with similar statistics). Then one could say that the information contents of the whole book is that short algorithm. That would be the case, for example if one proved that the text of the VMS is the digits of Pi encoded in a simple scheme (like a Roman number for each three decimal digits). That was the case of the mysterious book with large tables of letters that got John Dee totally trapped into bogus conversations with a large language model angels in a crystal ball. All the best, --stolfi RE: The structure of the Voynich text and how it may be generated - quimqu - 11-04-2026 (11-04-2026, 02:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I see that you are taking seriously the theory by Thorsten & Timm that the VMS text was generated by some complicated random process with some kind of feedback ("copy and mutate"). Hello Jorge, not really. This is a sort of process that lead me to this kind of model, that I don't think it might be the real thing, but that we can interpolate like:
The thing is that I started detecting burst and then I tried some kind of model that explains the text features summarized in this You are not allowed to view links. Register or Login to view. So I am not getting into how the words are created, but I intend to understand the structure of them. How they are positioned in the text. And what I saw is that depending on the position, some families of words tend to appear more than others, for example, as they seem to depend on the local context. Again, this can be also some feature of lenguages, under my understanding. Also, I don't claim that the generated text is Voynichese; it is just a simmulation. Note that points 2, 3, 4 and 7 might be just results of what is written, but I am not telling that it is a random generation or gibberish. What it really can say, according to the numbers, is that there is some sort of positional an contextual constraint, but again, this could be a feature of a language, who knows? RE: The structure of the Voynich text and how it may be generated - DG97EEB - 11-04-2026 (11-04-2026, 08:52 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.(11-04-2026, 02:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I see that you are taking seriously the theory by Thorsten & Timm that the VMS text was generated by some complicated random process with some kind of feedback ("copy and mutate"). That's the point though. It's just a simulation and there are a seemingly infinite number of simulations. None of which gets you any closer to finding meaning. Your next step will be to look at what the seed might be, and you'll consider plain text, then different kinds of ciphered texts, and eventually conclude that all of them work but none of them quite works the way it should if it were the actual answer.... RE: The structure of the Voynich text and how it may be generated - Rafal - 11-04-2026 Quote:It is mathematically impossible to prove that a string is "random", or even to present evidence that it is "probably random". Not necessarily. Let's take a hypothetical long string of numbers 1-6 written in medieval book. We made calculations and observed that: - all numbers have very similar frequencies - there aren't any longer sequences repeating In such case we can propose a scenario that it was probably generated by throwing a 6-walled dice. RE: The structure of the Voynich text and how it may be generated - Jorge_Stolfi - 11-04-2026 (11-04-2026, 11:55 AM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Let's take a hypothetical long string of numbers 1-6 written in medieval book. We made calculations and observed that: That would be a possibility but those observations do not prove it, and not even make it likely. The sequence of digits of Pi in base 6 has those properties too, but it is not random. (Same for sqrt(2), or the digits of most irrational real numbers in any base) And if you take any text, even a very repetitive one, and encode it with the Vigenère cipher using the digits of Pi as the key, you will get a ciphertext that has those same properties too. (When I was getting my Masters in applied math, a friend of mine was doing his thesis on a simple process for generating infinite strings that were "cube-free" -- that did not have substrings that repeat three times in a row. For instance the string 0010100110100110100110101 is not "cube-free" because it contains 010011 three times in a row. The problem was inspired by a rule of chess that declares a draw if the players repeat the same sequence of moves three times in a row. I forgot the details, but it was something like this. You need at least three letters in the alphabet, say A, B, C. You start with a single A and then do repeated passes where each letters is replaced by a specific string of those three letters. Like A -> ABCA, B -> CA, C -> BAC. Thus you get A, then ABCA, then ABCACABAC, then ABCACABACABCABACABCACAABCABAC, etc. I just guessed these particular replacements and they probably don't work, but with the right rules one gets arbitrarily long strings that are cube-free. Possibly they have equal numbers of As, B, and Cs. This "iterated verbose substitution cipher" process is very simple procedure that someone like Tartaglia or Fibonacci could have thought of and played around with, well before the 15th century. It is less complicated than the process that was used by the anonymous nerd who created those tables that baffled John Dee (although, AFAIK, those were probably created in the 16th century). The point is that it is actually easier to generate non-random deterministic strings that "look random" under simple statistical test, than to generate truly random sequences... At some point the other Masters students in the department gave that guy, as a birthday present, a fake Master's thesis, properly formatted and bound -- including with the official cardboard thesis cover, fake examiners' signatures, etc -- titled "The Problem of the Abacas"; with every one of its 100 pages filled with a random string of A, B, and C.) RE: The structure of the Voynich text and how it may be generated - Rafal - 11-04-2026 Quote:The sequence of digits of Pi in base 6 has those properties too, but it is not random Well, I must agree with that ![]() I still hope that if Voynich is semi-mechanically generated text without meaning, then it may be at some moment possible to prove it. But it may be indeed hard. RE: The structure of the Voynich text and how it may be generated - quimqu - 11-04-2026 I did a new experiment today, not with the model, but trying to focus on generation of similar words. Starting with daiin, the most used word in the MS. I started from a simple version of the similar-token generation idea. If words like daiin are produced by making small variations of nearby forms, then its closest variants, the words at Levenshtein distance ≤ 1, should also tend to appear in similar surroundings. So I took the immediate neighbours of daiin in form, and I compared the words that come before and after them. I also allowed context to continue across line breaks when the paragraph continues, so the comparison was not artificially cut by the layout. The first result was negative for a naïve version of the idea. The exact neighbouring words of these variants are usually not very similar to the exact neighbouring words of daiin. In other words, just because two words look almost the same does not mean they sit in the same local context. Then I relaxed the test. Instead of asking whether the neighbours are exactly the same, I asked whether the neighbours are themselves also close variants, again with Levenshtein ≤ 1. This is a fairer test for a model based on repeated local variation. Under this looser comparison, the overlap rises a lot. So the variants of daiin do not usually share the same exact neighbours, but they often share neighbours from the same broad family of similar forms. That is the important point. A free and almost random variation model would suggest that close variants should circulate in almost the same environments. What I see is more constrained than that. The system does seem to keep words inside a broader field of similar contexts, but not in a loose or fully interchangeable way. Some variants are clearly closer to daiin than others, and they are closer in different directions. In particular, dain stands out more on the previous-word side, while aiin stands out more on the next-word side. So even inside the immediate neighbourhood of daiin, the closest forms are not behaving in exactly the same way. They are partly related, but they are not simple drop-in substitutes.
A simpler way to say it: Changing a word into a very similar one does not give you the same neighbours. But it often gives you neighbours that also look similar. So the system is not freely swapping words. It stays inside small groups of related forms. |