The Voynich Ninja

Full Version: Discussion of "A possible generating algorithm of the Voynich manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(25-10-2019, 07:49 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
  • I have executed this type of test in chapter 3 "Evidence" in Timm 2014 for two control samples: "In this paper all words occurring seven times and all words occurring eight times are used as two separate control samples. ..." (Timm 2015, p. 12ff).
  • I have presented the statics for the example in Figure 1: The core network for page You are not allowed to view links. Register or Login to view. contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %). Similar subnets can be constructed for all pages, but, of course, they are more instructive for pages containing lots of text. We even provide network graphs for all pages containing at least some lines in 1.3 "Graphs for individual pages" (see Additional Materials).
  • I have presented the network graph for a single page as well as for the whole text (see Timm & Schinner 2019, p. 4).
  • I have published the network graphs for any page in the VMS. You can just open the gephi projects (You are not allowed to view links. Register or Login to view.) for any page you want and calculate the number of word types and word tokens within the core network with a edit distance of 1: 
  • For instance the core network for page You are not allowed to view links. Register or Login to view. contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %).  
  • The core network for page You are not allowed to view links. Register or Login to view. contains 239 out of 354 words (=67.5 %). They represent 485 token out of 613 tokens (=79.1 %). 
  • The core network for page You are not allowed to view links. Register or Login to view. contains 172 out of 267 words (=66.9 %). They represent 305 token out of 394 tokens (=77.4 %).
  • ...
If you can't accept this answers for any reason you should explain yourself. Just to ignore the answers and too loop over your question doesn't help.

I already did. The existence of a "network" of words with small edit distances is not in any way evidence that the text was generated by picking previous words and changing them.

The test I proposed can show exactly whether this is how it was done.
Not only "whether" but also details of the method (if it works):
- which edit distance was typically applied?
- how much can be explained with an edit distance of 1, 2, 3, etc?

If this were my theory, I would be extremely curious about these things.
Aus dem deutschen Buch der 3. Klasse.
"Du da, der Du den da, zum Manne nehmen willst, und Du da, der du die da zur Frau nimmst." Bla bla bla.
Du da, wen du dorthin bringen willst, zum Mann, und du da, wen du dorthin zur Frau bringst. Bla bla bla.
Genau so geschrieben, wie Sie es nicht tun sollten.
Aber genau das passiert in den alten Büchern.
"Kombinationen wie, und dann muss / aber dann kann / dann aber ist / weil aber dann ........."
Kombinationen mögen, und dann muss / aber dann kann / aber dann ist / weil dann ...........
Drei oder vier Kombinationen sind nicht so selten wie Sie denken.

Ich weiß nicht, wie es auf Englisch ist, aber wenn ich in VM deutschen Text habe, muss ich auch in diese Richtung denken.
Aga, I can't understand you. Sorry. Can you translate in english, please ?
@Paris
Interesting, if I have turned on the website translator, he also translates it back into German, where I actually posted in English. Sorry.

From the 3rd grade German book.
"You there, who want to take that one as your man, and you there, who want to take that one as your wife." Blah blah blah.
Written exactly the way you shouldn't.
But that is exactly what happens in the old books.
"Combinations like, and then must / but then can / but then is / because but then .........."
Three or four combinations are not as rare as you think.

I don't know what it's like in English, but if I have German text in VM, I also have to think in this direction.
The self-citation process describes well the 'text' in the VM. It is a kind of mirror that reflects a reality. The scribe observes that living reality and copies its movement. That reality is none other than the movement of celestial objects. He does no copy it directly from the sky but from a volvelle or other mechanical device that reproduces that movement.
  This hypothesis would explain the features of the script described in the autocopist theory. Words are no words but chains of interchangeable symbols. Each symbol has an astronomical meaning, so similarly spelled tokens are near to each other because they depend in some way on each other. 
  The regularity of the celestial movements would also explain the regularities that we observe in the script. The scribe is not free when writing symbol strings. He reproduces a mechanism, although that mechanism is somewhat flexible, which would explain chains Currier A and B.
(27-10-2019, 07:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I already did. The existence of a "network" of words with small edit distances is not in any way evidence that the text was generated by picking previous words and changing them.

If it comes to the evidence we actually say: "Nevertheless, an exhaustive scan of the parameter space (involving thousands of automatically analyzed text samples) verified the overall stability of the proposed algorithm. About 10–20% of the parameter space even yields excellent numerical conformity (< 10% relative error) with all considered key features of the real VMS text (entropy values, random walk exponents, token length distribution, etc.)." (Timm & Schinner 2019, p. 16).

So far you didn't said anything about the provided statistics. Please note the following comment of Prescott H. Currier: "The validity of text produced by any method at all must, I think, be judged against this statistical background" (You are not allowed to view links. Register or Login to view.).

If it comes to the network of similar words we actually describe it as an observation: "The global VMS graph, as well as the corresponding subnetworks for individual folios, give more evidence for a fundamental connection between token frequency, number of similar tokens, and position within the text" (Timm & Schinner 2019, p. 6).

The self-citation method is build on this observations (see You are not allowed to view links. Register or Login to view., p. 14-18). Until now you accept "that similar tokens appear near each other" but you didn't said anything about subnetworks for individual folios or about the connection between token frequency and number of similar tokens.

Instead of taking the time to reason through the argument you just denied the conclusions. Moreover, you ask for an observation, on which the self-citation method is build on, as evidence for the correctness of the self-citation method.


(27-10-2019, 07:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The test I proposed can show exactly whether this is how it was done.

You actually You are not allowed to view links. Register or Login to view. "to check each word in the MS if there is a recent (how far back?) similar word (which max. edit distance?) from which it could be derived". With other words instead of checking the details of the argumentation you ask for more details. At the same time you complain that my papers "are very long....". Sorry, but this doesn't says anything about the validity of the argumentation in favor of self-citation method.

Moreover, since the self-citation method is based on this type of observations we even provide details for You are not allowed to view links. Register or Login to view.. For example, the most frequent tokens on folio You are not allowed to view links. Register or Login to view. "are <qokeedy>, <qokedy>, and <okedy>, each one appearing 16 times. ... The You are not allowed to view links. Register or Login to view., built around the three You are not allowed to view links. Register or Login to view. of folio You are not allowed to view links. Register or Login to view. (restricted to their 33 most similar tokens), gives a first impression of an existing deep correlation between frequency, similarity, and spatial vicinity of tokens within the VMS text" (Timm & Schinner 2019, p. 5). The subnetwork of words with an edit distance of 1 covers 81.8 % of the text on folio f108. Therefore, the answer to your question is for folio You are not allowed to view links. Register or Login to view. that for more than 80 % of the tokens (at least on the same folio) there is a previous similar word (with an edit distance of 1). Please note, the corresponding value for Currier A is 82 %, for Currier B 85.5 %, and for the whole VMS 84.7 % (see You are not allowed to view links. Register or Login to view.).
I didn't mean to say that the long papers are bad. Quite the contrary: they contain a lot of valuable statistics. It is just that it makes it more time-consuming to ingest it all.

The reason for all my questions and doubts can be explained as follows.

Were the auto-copying hypothesis presented as a hypothesis that is still being further analysed, then I have no problem whatsover. It is of interest. The statistics are of interest.
If it is presented as "the way in which the MS was produced" and evidence/proof that the MS is meaningless, then I want to understand better.

From the very first mention of the idea I considered that this process is too arbitrary and does not explain some of the most important properties of the text. I still think so but it is qualitative and not sufficient to decide if it can be true or not.

The near-repetitions are explained, the appearance of Eva-f and p primarily (but not at all exclusively) in top lines of paragraphs is explained, but the word structure is not. And there are a few pother things, already mentioned before.

So I try to imagine how this would have arisen.
This puts some constraints on how the system is initialised and how the changes are applied.

The changes during auto-copying are clearly not at all arbitrary

The discussion is quite similar to that in 2004 or so when Gordon Rugg presented his method.
To take into account the known properties of the text, the method will require numerous dedicated adaptations, and here is where Occam's razor comes into play. Not that it is proof of anything. It is a sign.
"ReneZ"
So I try to imagine how this would have arisen.
This puts some constraints on how the system is initialised and how the changes are applied.

The changes during auto-copying are clearly not at all arbitrary

  I have the same opinion.
Therefore, I asked Timm a question earlier (I didn’t get an answer) why the word changes are kept in order to comply with the rule of impossibility of sequential writing of certain characters. Why complicate the procedure?
I finally got around to reading the entirety of Torsten Timm's paper, the follow up work he has published as a companion / continuation of his hypothesis, and this thread. I have also read the blog of Brian Cham and the posts on this forum of Stephen Carlson, who last I've heard are supportive of it. I've also taken special note of Koen's statistical analyses that include control groups of known language texts, a transcript of the VMS, and an output of TT's algorithm, all controlled for text length. It's these graphs which are perhaps the most striking of all, as most appear to show the VMS and TT's gibberish clustering together in their statistical behavior, away from a different cluster that includes real language samples. How to best interpret Koen's data is a matter of some debate, but to my untrained eye, it almost seems like TT's bot behaves "even more VMS-y than the VMS"; the statistical anomalies include most of those seen for the VMS, but even more anomalous in that same way. If this indeed a valid trend correctly interpreted, I would say this lends support to TT's theory. Does it prove it beyond a reasonable doubt? Definitely not. But it is evidence in his favor.

I think the best challenge to Koen's data would be to find specimens of symbolic language known to have meaning, contemporaneous to the VMS, and similar in text length, which cluster near the VMS (and TT's bot) when subjected to the same statistical analyses. Methinks this could be a difficult hunt.

On the other hand, I'm sure this isn't the last algorithm for reverse-engineering VMSish nonsense we'll see; I'm currently working on one now, involving dice. I'll happily hang it up here for target practice when I finish building it, and would be happy to see its output subjected to the same statistical analyses that Koen and Julian Bunn have used.

There are a number of relatively recent works composed of English-ish sounding gibberish, good enough to fool someone who doesn't speak English into thinking they're hearing real English. The Italian song You are not allowed to view links. Register or Login to view. and the short film You are not allowed to view links. Register or Login to view. are the best known examples. Both of these were entirely scripted, and the scripts could easily be compared statistically to similar-length text of real English. The results would be hard to interpret as significant when the specimens are so short. Still, if the creators of either work or any similar one were to use the same algorithm to create a much longer text, it might be worth seeing how its statistical properties differ from that of a work in real English.

Is there any precedent, from any time period prior to the advent of computers, of anyone producing voluminous amounts of highly structured asemic writing? The only one I can think of is You are not allowed to view links. Register or Login to view.; statistical analyses You are not allowed to view links. Register or Login to view. on Hamptonese, and it's not entirely clear that it's meaningless or asemic. If any good examples of human-generated large-volume data-mineable meaningless text are available comparison, it might be worth asking how and why such works were created. If a method comparable in technique to TT's autocopy algorithm was involved, that certainly lends support to him.

There is a very much a limit to the explanatory power of TT's autocopy hypothesis, and I think Mr. Timm admits this quite readily. If no example of a similar algorithm producing a similar product can be found, that doesn't disprove it. But raises the number and size of assumptions required to accept it, which is not good news either. If we accept that the VMS is utterly unique in its execution, that's a lot to explain, especially in a time and place where unique creations and new and unique ways of doing things were not the cultural norm.

I hear Mr Timm also admitting repeatedly another major limitation to his study: his text was composed by a bot, the VMS by a human. When attempting to run a simple algorithm repeatedly, a human mind gets bored and — consciously and unconsciously — introduces levels of complexity and subtle patterns in the "random" output. A bot doesn't do this, and can't [yet] model this property of the human mind with any accuracy. Is this enough to explain the subtle clustering of vords and vord-pieces by page and apparent topic? Maybe. Or maybe not.

If there were grant money available for anything VMS related (ha!), it would be interesting to pay some teams of starving university students who'd never heard of the VMS to receive a facsimile copy of the VMS with the text removed, and actually run TT's algorithm by hand (with a real quill and iron gall ink) until the facsimile is filled with text. I would keep metrics on how much time a single scribe could typically put into this effort in one sitting before becoming too tired or bored to continue. I'd want to analyze the output of each writing session, to see if certain statistical properties typified the text written near the start, versus text composed near quitting time. Then look at the real VMS: Are there signs of a similar statistical shift every X pages or paragraphs or so, of the scribe struggling with the algorithm and wanting to just get the stupid thing over with for the day? That would definitely lend support to TT's idea.

I don't think TT's conclusions to date are the last word on the VMS, and I'm still going to chase the possibility that there is meaning locked in there. But I give him credit: he has accomplished the objective he set for himself, which was simply to show that generating a large amount of nonsense text that looks and feels like real language could have been possible at the time the VMS was created. Mr. Timm's own beliefs (that the VMS's text is indeed meaningless) need to be separated from the very parsimonious conclusion of his scholarly work. Because he has not shown that such an event happened or was even likely to have happened, and never claimed he did.
(29-10-2019, 10:27 AM)Wladimir D Wrote: You are not allowed to view links. Register or Login to view."ReneZ"
So I try to imagine how this would have arisen.
This puts some constraints on how the system is initialised and how the changes are applied.

The changes during auto-copying are clearly not at all arbitrary

  I have the same opinion.
Therefore, I asked Timm a question earlier (I didn’t get an answer) why the word changes are kept in order to comply with the rule of impossibility of sequential writing of certain characters. Why complicate the procedure?

Please test it yourself. Fill an empty page with text by writing or by typing each word. Start always with the word 'repeat'.
In case A) just repeat the word 'repeat' until the page is full. Typos are not allowed.
In case B) repeat previously written words and replace always at least one letter with a different one. Typos and repeated words are allowed.
In case C) use arbitrary letter sequences with the length of 6 without copying a previously written word and without repeating the word generation method. Typos and repeated words are not allowed.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25