The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10

If the "copy" and "modify" parts of the algorithm are turned off, leaving only a "generate from scratch" step, it reduces to a Markov chain of order zero -- that just outputs words from a fixed probability distribution, without regard for previous outputs.

With the right "generate" algorithm, such a generator can reproduce some statistical properties of the VMS text, such as Zipf law, per-word word entropy, word structure, and glyph and glyph pair distributions, But it cannot produce any word-pair distribution other than (random first word) x (random second word). That is, the next-word entropy would be the same as the plain word entropy.

As I wrote earlier, even with a significant copy probability, the word-pair distribution should gradually tend to this state; since each time the source pointer is reset another random word pair would be added to the repertoire of pairs available for subsequent copying.

(25-08-2025, 01:45 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.interesting to sweep across a wider parameter space for this generator: randomly iterate the code’s various threshold parameters, sweep across a wide range of initializing lines

It seems to me unlikely that any tinkering with the algorithm will lead to any leap in understanding of the manuscript. If you read Torsten Timm's papers you will read that his 'simple process for random text generation' just reproduces 'the key statistical properties' of the language of the manuscript. There is no claim that his method can truly reproduce the text of the manuscript.

(25-08-2025, 08:45 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.It seems to me unlikely that any tinkering with the algorithm will lead to any leap in understanding of the manuscript. If you read Torsten Timm's papers you will read that his 'simple process for random text generation' just reproduces 'the key statistical properties' of the language of the manuscript. There is no claim that his method can truly reproduce the text of the manuscript.

Ah good. Someone finally noticed that.

(25-08-2025, 08:45 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.just reproduces 'the key statistical properties' of the language of the manuscript.

except that it doesn't even do that, which is Stolfi's point here.

(24-08-2025, 08:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(23-08-2025, 08:23 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.There was no need to invent an artificial “gibberish generation” mechanism. As D’Imperio already observed [...]

She was not stating a fact. She was proposing her version of the "hoax" theory. Which, in general terms, apparently is the same as yours. Which has the same problems as yours.

Actually, D’Imperio specifically described the practice of concealing an encrypted message within a longer dummy text. In doing so, she noted that scribes confronted with such a task would typically generate meaningless text by repeating parts of neighboring strings with various small changes.

This is what D'Imperio wrote (You are not allowed to view links. Register or Login to view.):
[attachment=11339]

(24-08-2025, 08:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Torsten Wrote:Consequently, a scribe attempting to generate language-like gibberish would, sooner or later, abandon the laborious task of perpetual invention in favor of the far easier strategy of reduplicating and adapting previously written material — and would ultimately adhere to this approach consistently.

Note my emphasis. The problem is that the "adapting" is far from a simple step. Voynichese words have a very restricted structure, so the "adapting" must be random but such that it preserves that structure. At this point the gibberish generation method is not much easier than generating each word from scratch (as Rugg had proposed), and is totally not "natural".

The modifications in question are not random, nor do they disrupt the underlying word structure. Instead, they follow systematic patterns—most commonly the substitution of one or more glyphs with similar ones. For example, it is considerably easier for a scribe to transform chaiin into shaiin, or daiin into dain, than to invent entirely new word forms from scratch. This process preserves structural consistency while naturally generating the kinds of variation observed in the Voynich text.

Additionally, for a text created by self-citation it appears as a logical assumption that the scribe also used aesthetically motivated design rules for glyph selection, in order to harmonize the overall appearance of the text (see You are not allowed to view links. Register or Login to view., p. 10).

(24-08-2025, 08:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In fact (if I read you correctly), your justification for your proposed method is that it creates the repetitiousness that you claim to see in the VMS; which is a clue that the text is gibberish. Wouldn't the Author have worried about this last fact?

Paraphrasing your argument: "The VMS text has statistical properties X, Y, and Z, where Z is 'repetitiousnss'. Here is an algorithm that generates gibberish with properties X, Y and Z. Therefore the VMS must be gibberish."

That is not an accurate representation of my argument. I would kindly ask you to refer directly to my published papers rather than speculate about my position, as they set out the hypothesis and its justification in detail.

(24-08-2025, 08:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.How could a "parameterless" Mutate function produce these asymmetric word frequencies?

My hypothesis is directly grounded in the observed word frequencies of Voynichese. The so-called “parameterless” aspect of the self-citation method does not ignore these asymmetries; rather, it seeks to explain them as natural outcomes of the iterative copying and modification process itself, without the need for externally imposed parameters.

See for instance You are not allowed to view links. Register or Login to view., p. 6:

Quote:The respective frequency counts confirm the general principle: high-frequency tokens also tend to have high numbers of similar words. This is illustrated in greater detail in Figure 3: "isolated" words (i.e. unconnected nodes in the graph) usually appear just once in the entire VMS while the most frequent token <daiin> (836 occurrences) has 36 counterparts with edit distance 1. Note that most of these "isolated" words can be seen as concatenations of more frequent words (e.g. <polcheolkain>=<pol>+<cheol>+<kain>). This characteristic dependence of token frequency from word similarity is just another manifestation of the long-range correlations that have been uncovered and discussed by several researchers throughout the last decade.

(19-08-2025, 08:24 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.Someone well educated, potentially well-traveled, and not artistically gifted. What does the VMs artist know? All the things from swallowtail merlons to nebuly lines to mythical Melusine and the Agnus Dei.

It's not a hoax. It is a trick. In particular examples, the VMs cosmos and White Aries, trickery is an intentional part of the illustration. The papelonny patterns are in place, but the trick fails if the reader can't do the canting.

The VMs was constructed as a puzzle, a hiding place for something, but it is unclear if the remaining parts are sufficient to recover that information.

R. Sale, I hope you’re right, because (like a lot of us here, I ween) I love a good mystery, with many trails of tasty breadcrumbs. Just for the sake of playing devil’s advocate though, have you ever seen the film You are not allowed to view links. Register or Login to view.? Spoiler alert: the story-within-a-story that makes up most of the film’s plot is full of strange, highly specific, sometimes credibility-stretching details and references to other things, such that it creates an intriguing rabbit hole, that makes you want to know more and why and how those specific details fit together. The twist ending is both mind-blowing and paradoxically, at the same time, incredibly anticlimactic: The story-within-a-story was made up on the spot by the practiced liar / con artist played by Kevin Spacey, using nothing but the random assortment of words and names around the room where the police are interrogating him.[url=https://www.imdb.com/title/tt0114814/?ref_=nv_sr_srsg_0_tt_6_nm_2_in_0_q_usual%2520s][/url]

What if our VMs creators were charlatans who somehow had access to some rich book collector or monastery's collection, and assembled their highly specific iconographic references and memetic trails that lead nowhere, from nothing more than what happened to be in the books they had access to, when they could do their work?

Although the VMs examples are few in number, once the corresponding historical elements have been found, the extent of VMs artistic trickery starts to become more clear. However, that does not yet reveal whether there is something of value to be found, or whether the purpose is merely trickery for the sake of deception. If we ever reach the rainbow's end, given the VMs as it is we may not, will there be a pot of gold or just a puddle of mud?

It still seems to me that what would be more helpful is a list of known scams, known confidence operations, known "fake" relics.

You are trying to put yourself into the mind of the people that did this, but making value judgements according to what you think is reasonable rather than what has been historically shown to be done, whether it seems reasonable to you or not.

I lean toward it being a religiously inspired prop because there is so much of that happening in the middle ages. But I can't dismiss the possibility that it might encode some secret meaning, because that was also a part of the medieval mindset. I love the music of the time (OK "love" is a strong word) But the symbolism they embedded into everything is certainly cause for pause.

My third theory of its usage would be some kind of secret society, inside the monastery where it was produced: my assumption there being that it was produced in a monastery. But still basically encoding nonsense.

(25-08-2025, 08:30 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.If the "copy" and "modify" parts of the algorithm are turned off, leaving only a "generate from scratch" step, it reduces to a Markov chain of order zero -- that just outputs words from a fixed probability distribution, without regard for previous outputs.

With the right "generate" algorithm, such a generator can reproduce some statistical properties of the VMS text, such as Zipf law, per-word word entropy, word structure, and glyph and glyph pair distributions, But it cannot produce any word-pair distribution other than (random first word) x (random second word). That is, the next-word entropy would be the same as the plain word entropy.

As I wrote earlier, even with a significant copy probability, the word-pair distribution should gradually tend to this state; since each time the source pointer is reset another random word pair would be added to the repertoire of pairs available for subsequent copying.

You are assuming the existence of a fixed seed text from which words are copied and modified. That is not the case. In the self-citation model, the Voynich text functions simultaneously as both the source and the outcome of the copying process. This recursive dynamic—where new words continually draw on previously generated ones—is precisely what the “self” in self-citation refers to.

This is also why the process differs fundamentally from a zero-order Markov chain. A Markov chain simply samples words from a fixed probability distribution without regard to prior outputs. By contrast, in the self-citation model, the distribution itself is continuously reshaped by the ongoing act of copying and modification. In other words, the mechanism is history-dependent: each step reflects and reinforces earlier choices, creating the asymmetries and local clustering observed in the Voynich text. This way, the gradual evolution from Currier language A to B can be understood as automatic side-effect of the self citation method.

Note: The algorithm requires only a minimal seed (e.g., a single line of text) to initialize. The specific choice of seed is irrelevant; what matters are the procedures for selecting source words from the previously generated text and the rules governing their modification. In our implementation, we used line f103v.P.9 of the VMS as seed—<pchal shal shorchdy okeor okain shedy pchedy qotchedy qotar ol lkar>—to generate a corpus of more than 10,000 words. The resulting text contained 7,678 Voynich words (70%) and 3,156 non-Voynich words (30%). These figures alone illustrate that the self-citation method can naturally yield text with Voynich-like properties.

Pages: 1 2 3 4 5 6 7 8 9 10

Jorge_Stolfi

dashstofsk

asteckley

ReneZ

Torsten

Torsten

RenegadeHealer

R. Sale

GrooveDuke

Torsten