(oshfdk is a little more concise, but I wrote this before seeing their reply and gets at a few more things here)
(31-05-2026, 12:57 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Torsten Timm has pointed out, it's likely a very human method which is not going to be duplicated in code easily.
One might even say it's not going to
replicate in code easily!
No one is arguing that there aren't clusters of words in short edit distance, nor that a process which captures that fact can't [edited from "can"]
model the text's statistics. It is interesting, and serves as a broad-ranging rebuttal to a number of statistical analyses that had been purported to suggest meaningfulness, that those edit networks plus those constraints have enough information to reconstitute those features.
This is a different claim:
(31-05-2026, 12:57 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Interpretation: This table shows that with the exception of 1 token, all other tokens are a product of copy/mutate from f1r.
When you claim the tokens
in fv1 are the
product of such and such a process, you are no longer claiming to have modeled them. You are claiming to know as a matter of historical fact what the scribe did. This is an extremely strong claim for the banal reason that it's hard to prove what someone did 600 years ago, but there are several other factors that make inferring from the model hard.
These models are in a sense "closely held" to the source text. At risk of broadsiding Torsten, copy-mutate was selected specifically because it reproduces the short edit distances. To the limits of his and his coauthor's argument, that is fine, but it is hardly an emergent property---and without close reading his paper for it, I don't recall him asserting it is. Likewise, the fact that both your models force words through Voynich-like networks of words at rates derived from summary statistics gleaned from the manuscript means that they are likely mutating through spaces of words that have similar properties. The reason why Torsten's paper has force, and why it has the conclusions it does, is that it shows that information is sufficient to produce a Voynich-like text, and so an auto-citation process that has substantially similar properties would be expected to result in a statistically Voynich-like text. Timm and Schinner do
not claim the text in the Voynich were produced that way, merely that it is a serious possibility; in fact, they rule out a very literal interpretation of their model! It is entirely possible that their and your models depend enough on patterns that arose from an underlying process interacting with meaningful text that it is reproducing them without reproducing the method. Ruling this is out is
very hard.
A genuine path forward for these models would be to show that they depend
less on the Voynich's summary statistics. Your analysis of the gallows letters, which I do not believe is shared by Timm and Schinner's model, is a case in point. If it could be shown that line start gallows letters were appearing more often at line start because of some simple rule that did not immediately imply the distribution, e.g. that it was emergent, it would lend credence to the idea that these features were dependent on simple rules undergirding copy-mutate rather than the distributions arising in the Voynich for unproved reasons. Your argument (section 4.2 of your paper) is explicitly that these features were inserted based on observations of the manuscript, so you have
defined the mutate process to have these properties, and are actually assuming the consequent when you claim to have explained it.
In fairness to everyone, it may be the case that there is no simple rule underlying the gallows distribution. It could be that the scribes liked them line start because they looked like capital letters at the start of lines in humanist manuscripts. But this is what I'm driving at. If that's all there is to it, it may be the best we can do is say that their process had a bias towards gallows in line start words and we may not be able to formally separate premise (we observe the bias) from conclusion (it is a product of an arbitrary choice).
By the by, if you are in fact "reproducing" the Voynich, failure to incorporate Currier's curve-line system observations and similar analyses seems like fair game to me. It also strikes me as the kind of "observable orthographic structure" you say the model addresses in in 7.6 of your paper. This is largely an aside to the main point here, which is that I don't think you're
proving your interpretation, but I'm not clear why statistics about letter bases are not part of the orthographic structures in the Voynich.