The Voynich Ninja

Full Version: Was the Voynich text generated by using the self-citation-method?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Klaus Schmeh has published a You are not allowed to view links. Register or Login to view. about the Voynich manuscript: You are not allowed to view links. Register or Login to view.

Quote:Timm’s method involves first creating a few pieces of text and then repeating them many times with changes. This procedure, which can be called “self-citation,” sounds extremely simple, but is apparently able to reproduce many features of the Voynich manuscript text. It also explains the fact that the content of the manuscript has some properties of natural language, but to all appearances is not written in such (thanks at this point to the linguist Jan Henrik Holst, who recently made this clear to me again).
Klaus Schmeh is a blogger I’ve followed, and even back in 2014, something about his choice of words and overall vibe made me suspect he was unusually impressed with Torsten Timm’s solution. Sure enough. It’s a similar feeling to a public figure coming out as atheist; I’m not usually surprised, if it’s someone whose opinions I’ve read or heard a lot of. Unstated, but hinted at, is that Klaus has been trying to think of a way to falsify Torsten’s theory since he first published it, and is now concluding he is unable to offer a good argument against it. And that is indeed how science works — the ideas we tentatively accept as “true”, are those that withstand all efforts to knock them down.

I still want to look at Claire Bowern’s experiment’s raw data, and see it subjected to the same statistical tests as both Torsten’s code’s output and the original VMs. If Torsten is right, and the 40 student experimenters indeed did match both real and synthetic Voynich Manuscripts statistically, I’d have to say, that would go a long way to selling me on the self-citation hypothesis.
I still have great difficulty imagining someone writing a 200+ page manuscript (including illustrations) without the text having any informational content whatsoever. To demonstrate that such a thing is possible, the author would probably have been satisfied with a much less comprehensive work.

I am not saying that Timm's solution is not possible, but just because it cannot be refuted does not make it indisputably correct.
(15-09-2021, 01:33 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.I still have great difficulty imagining someone writing a 200+ page manuscript (including illustrations) without the text having any informational content whatsoever. To demonstrate that such a thing is possible, the author would probably have been satisfied with a much less comprehensive work.

I am not saying that Timm's solution is not possible, but just because it cannot be refuted does not make it indisputably correct.

Of course not. It’s a very hard theory to test reliably, and either rule in or rule out. What this means is, until/ unless someone proffers a meaningful decryption of the manuscript, the stochastically generated nonsense hypothesis will likely always remain on the table. But, by the same token, the continued viability of this hypothesis will likely never be enough to rule out or deter speculation about meaningful information in the VMs text.

If someone were motivated by the prospect of much money or prestige, I could imagine him breaking up this stochastic pseudo-text generation task into small daily chunks — a folio or two per day, perhaps — and quitting for the day when the task became too mentally taxing or inefficient. Of course, the same could be said about encoding real information using a novel and repetitive encryption method.

I’d be especially interested in seeing how the diversity of glyphs, ngrams, and both types and tokens changes in each of Bowern’s students’ specimens, from beginning to end, as the tedium of the task increased. Then seeing these metrics compared to similar-sized specimens of the VMs text. If evidence of a similar statistical pattern to this “faking fatigue” were found cyclically in the VMs, then I would have to agree that Bowern’s experiment not only failed to falsify Timm’s hypothesis, but unwittingly supported it.
Like I said before though, much of this depends on prior knowledge. If you know the VM has certain problems, then it becomes possible to either avoid or emulate those in one's pseudo writing. Ideally, the exercise should be done by people with a blank slate. Unless of course the test is whether it is possible to knowingly copy the VM's features, but that would in itself not prove or disprove much.
My objection to Timm's theory is essentially the same as before: any writer needs to adhere to a set of rules in the creation or alteration of words. Without them many patterns in the text would not appear. The writer in Timm's theory could have written words with any structure whatsoever, yet they quite consistently kept to the same structure. And it's not enough for his theory to simply state that rules exist and model them into the output. They must be described and explained: what are the rules and why do they exist?

It is not the case, as the article believes, that Timm's theory is rejected for being unsatisfying on a human level, but because it's incomplete on an explanatory level.
(15-09-2021, 05:35 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The writer in Timm's theory could have written words with any structure whatsoever, yet they quite consistently kept to the same structure ... what are the rules and why do they exist?

Quote:The rules to modify a source word normally don’t affect the order of the glyphs. This is one reason for the observation that the words in the VMS share the same rigid word structure. Additionally, for a text created by self-citation it appears as a logical assumption that the scribe also used aesthetically motivated design rules for glyph selection, in order to harmonize the overall appearance of the text. These rules specify when two glyphs may follow one another. Currier wrote: “There seem to be very strong constraints in combinations of symbols; only a very limited number of letters occur with each other in certain positions of a ‘word’.” Coarsely speaking, the shape of a glyph must be compatible with the shape of the previous one, and is also influenced by its position within a word or a line. (Timm and Schinner, 2020, p.10. Emphasis mine.) 

Just to play devil's advocate, what I hear Torsten Timm saying here is that the rigid vord structure was designed so that, with minimum time and effort, any vord could be easily changed into another one, without making its similarity to its predecessor obvious. I could draw analogy to an improvisational jazz musician, who is given a time signature and a key as a rigid set of rules, and once practiced at playing an instrument within these constraints, can rapidly vary the melody and still sound good.

Another interpretation of "aesthetically motivated design" I could see, is creating and sticking to a strict set of glyph placement rules in order to give the impression of "too orderly to be nonsense".

But you do make a good point: Torsten's description of his autocopy algorithm is weak in its explanation of:

  1. Why this particular glyph set, and these specific rigid rules for arranging them? Why would a whole new writing system even be needed to write gibberish by the autocopy method that looks convincingly like real language? It seems like more trouble than it's worth, and not a time-saver.
  2. How is the autocopy algorithm executed? One vord at a time left-to-right across a line, or pepper a page with variations on one self-cited vord, and then repeat until all the lines are filled in? This isn't clear at all from his description, and I have a hard time imagining either method being particularly quick, without a lot of advance practice.
The problem with an explanation which relies on the scribe being "aesthetically motivated" is that it effectively puts the process in a closed box we can't open. We can't judge aesthetics objectively, so will never know why [lk] is so much more pleasing than [lt], or just what bothered the scribe about [pe] and [fe], or why [ych] is pretty at the start of a line but ugly elsewhere...
I see this passage in Schmeh's blog post as a significant evolution in Torsten's views:

Quote:What was not clear to me until now, but what Torsten Timm pointed out to me, is that the self-citation method can also be used unconsciously.

In the Cryptologia paper, Timm and Schinner wrote that the scribe was "executing" the algorithm; they only referred to unconscious processes ("spontaneous impulses") to explain the differences between the output of their software and actual Voynichese. The algorithm is now seen as a partial model of an unconscious cognitive process: I find this idea much more interesting.

I also share Schmeh's interest in Bowern's experiments about the spontaneous generation of pseudo-language. In order to investigate unconscious processes, Claire's cognitive experiments look like the way to go. I hope she will publish more details about the data she gathered or that other equally qualified researchers will pursue her line of investigation.


A wonderful You are not allowed to view links. Register or Login to view. (discussed You are not allowed to view links. Register or Login to view.) recently added new depth to what we know about the structure of Voynichese. Patrick created a system that visualizes the behaviour of word patterns in lines. Each word token in a line is mapped into one of 10 slots according to its position inside a line (1=line-start, 10=line-end) and positional frequencies are plotted. This shows a number of phenomena that not only affect  the first and last words in lines, but display "smooth" preferences across the whole length of a line. Something similar can also be observed in poetry, where each line really is "a functional unit". Of course, we have no idea of why these patterns appear in the VMS: paragraph layout suggests that poetry can be excluded.

The plots below are based on Patrick's method. They show the behaviour of the ch- and sh- prefixes in VMS Quire 20 and Timm and Schinner's generated text (TTAS), compared with the t- and s- prefixes in Shakespeare's sonnets and in Dickens' "The Old Curiosity Shop" (You are not allowed to view links. Register or Login to view., where paragraphs were typographically split into lines of  similar length). The two English texts were shortened to be close to the other samples (~10K words).

You are not allowed to view links. Register or Login to view.

The VMS and Shakespeare plots show well-defined preferences for different line positions and relatively smooth ratio lines. On the other hands, Timm and Schinner's software and the typographical lines of Dickens' novel result in flat plots, where the "ratio" line has no discernible trend.

Personally, I doubt that the reason for the subtle positional preferences pointed out by Patrick is a set of rules consciously applied by the scribe: I believe that (as in the case of poetry) these preferences are unconscious and could be described by some kind of grammar (or, equivalently?, by some kind of algorithm). In the case of the VMS, this hypothetical grammar might or might not be related with the structure of an underlying natural or artificial language; in any case, trying to formally describe Voynichese line-grammar looks like an interesting task. It could be seen as a higher level step of what Stolfi did with Voynichese morphology.

If we had access to a reliable corpus of hand-written pseudo-language, it would be interesting to see if the line-patterns studied by Patrick can also be found there.
(16-09-2021, 04:16 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The problem with an explanation which relies on the scribe being "aesthetically motivated" is that it effectively puts the process in a closed box we can't open. We can't judge aesthetics objectively, so will never know why [lk] is so much more pleasing than [lt], or just what bothered the scribe about [pe] and [fe], or why [ych] is pretty at the start of a line but ugly elsewhere...

Emma, I hear you making two distinct points:
  1. Torsten’s autocopy self-citation algorithm is logically circular. A just-so story. It invites special pleading along the lines of “That’s just what the guy felt like doing”.
  2. The odds of a scribe or small team of scribes in the early 1400s writing >200 pages of asemic writing with all the statistical properties of the VMs text, is so vanishingly small as to be justifiably ruled out.
I’m with you on point number 1. I agree with Timm and Schinner that a stochastic process generating asemic writing could account for the VMs text. They’ve led me to the opinion that a VMs text without meaning is the proper null hypothesis for all attempts to find meaningful information in it. It’s another thing entirely, though, to claim that their algorithm was that which generated the VMs. That’s an easy claim to defend, when the steps to the algorithm are vaguely described and stretchy enough to fit any composition of a large amount of uniform and convincing asemic writing, and when the burden of proof isn’t even on you to begin with.

I disagree with point number 2. As Marco’s post alludes to, the idea that the output of the task of producing long-form random gibberish one line at a time will show statistically significant trends across human populations — which are complex, consistent, and anything but random — is not at all far-fetched to me. It’s likely that some or all of these hypothetical statistical metrics also correlate with certain types of meaningful information strings. But I could see long-form deliberate pseudo-language having a few distinct profiles of statistical metrics, with very good positive and negative predictive value for it.

A problem here is priming. As of now, too little is known about what environmental factors have large effects on any given person’s attempts and long form asemic writing, let alone speculate on which if any of these constraints the VMs text’s composer(s) worked under.
Pages: 1 2