The Voynich Ninja

(03-05-2020, 08:30 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view....

René, you are so convinced that the self-citation method must be wrong that you reject anything I say. It even doesn't matter why you reject it.

You didn't bother to use the search function at the top of this page to search for the term 'small changes'. You also also didn't bother to check the details of Pellings You are not allowed to view links. Register or Login to view. about my app or to take my You are not allowed to view links. Register or Login to view. or even your You are not allowed to view links. Register or Login to view. into account.

(02-05-2020, 08:10 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.So let's do the test. I have taken folio f58r. This has a lot of text in three paragraphs.

For every word, one can make the comparison with all previous words, and find the one with the smallest distance.
Using the most recent ZL transliteration (version 1r, You are not allowed to view links. Register or Login to view. ), and ignoring uncertain spaces (*), this has 344 words. Of these, exactly 25% have a minimum L distances of three or more.

Hi Rene,
I have spent a few hours trying to compare the ZL transliteration with the output of Timm and Schinner's software. As always, I may have made errors along the way.
Though I used the ZL file you mentioned, I could not perfectly replicate your results: for f58r, I find 344 words, 83 of which do not have a previous word with less than 3 Levenshtein distance. The exact 25% you found corresponds to 86 words: for some reason I apparently missed 3. I attach my list of 83 distant couples, in case you or anybody else wants to look into the details.

I have made the same measure for the different "pages" in Timm and Schinner's text (the blue squares). This plot shows the overall results.

[attachment=4288]

The TTAS generated text is lower than the cloud of Voynich data: it seems to have about half the "distant" words that appear in Voynich pages with similar word counts (about 15 vs about 30). Both distributions show a considerable variability and both sides of f78 could fall within the range produced by the algorithm.
If my results are not totally messed-up, it would seem that You are not allowed to view links. Register or Login to view. has an exceptionally high rate of distant words.

Thanks Marco!

I checked your list against mine, and it coincides almost exactly. The three differences are the following:

1) the very first word (korcholfy), which you correctly did not include. (However it is in a paragraph top line, so for the test it plays no role).

2) the last word in line 4, which ends with a 'weird character'. I think you have it as 'sholala' which has an L distance of 2 from sholar, but with the extra character it becomes 3.

3) the first word of line 21 (tcheos). I have L distance 3 w.r.t. cheoly, but perhaps you counted {ch'}ees in line 19 as chees, which would give only two.

Overall, these incidental cases do not have a significant impact.

However, it seems unfortunate that I picked a page that turns out to be exceptional in terms of disconnected words....

(04-05-2020, 08:02 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.René, you are so convinced that the self-citation method must be wrong that you reject anything I say. It even doesn't matter why you reject it.

You didn't bother to use the search function at the top of this page to search for the term 'small changes'. You also also didn't bother to check the details of Pellings You are not allowed to view links. Register or Login to view. about my app or to take my You are not allowed to view links. Register or Login to view. or even your You are not allowed to view links. Register or Login to view. into account.

Torsten, yes, I think that the self-citation method does not adequately explain the text in the Voynich MS, and I am also convinced that this is not 'how it was done'.

That is my opinion of course, just like your opinion is the opposite.
However, like you, I do my best to explain on what my opinion is based.

The many statistics that are used are of course not opinions. These are reproducible data.

I may have introduced the term 'small changes' myself, but it has also been used by Emma, whose opinions expressed near the start of this thread coincide in many respects with mine.
I think that this terminology is not essential to the discussion.

By the way, I am happy that you posted the link to your github area, because it includes many plots I wanted to see again, but I did not know where to look.

Just as a second check, I have used the Takeshi latest version (0d), counted uncertain spaces as spaces (not sure if there are any in this file), and repeated the exercise for f58v, which seems to have only half the number of 'distant' words compared to You are not allowed to view links. Register or Login to view. (according to Marco's plot).

Counting all lines, there are 38 words with L >= 3. This is just over 10%
Ignoring top lines of paragraphs, that leaves only 26, or 7.8%

Interesting that these two pages, which are overleaf and appear so similar, are so different in this respect.

Again referring to Marco's plot. the large 'blob' of pages in the lower-left part seems to have a ratio of 25% on average, so You are not allowed to view links. Register or Login to view. is similar but You are not allowed to view links. Register or Login to view. is not.

(05-05-2020, 12:59 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Torsten, yes, I think that the self-citation method does not adequately explain the text in the Voynich MS, and I am also convinced that this is not 'how it was done'.

That is my opinion of course, just like your opinion is the opposite.

If it comes to an undeciphered writing system there are only two options. You can either interpret your own rules into the text or you can reconstruct the rules specific to the writing system. If you choose the second option you have to limit yourself to a plain description of the writing system, no matter what the outcome will be.

This seems to be the right place to continue the discussion on the auto-copy hypothesis.

(This post has a general introduction first, and then (further down) some new statistics that argue against the auto-copy hypothesis.)

My thoughts are completely in line with those of Emma, stated near the start of this thread namely that there is a fundamental difference between:
- observing that similar words tend to appear near each other
- saying that the Voynich MS was created by arbitrarily writing words similar to previous words

The result does not provide evidence of the intent of the 'actor'. We cannot judge just from the statistics what was the process that generated these statistics.

It is understood that the evidence presented by Timm (and Schinner) lies with the computer algorithm for generating Voynich look-alike text, but the details of this algorithm have not been shown here, so there remain two important questions:
- how complicated is this algorithm?
- does it include any anachronistic elements?

One way to look at this is to observe the appearance of the most frequent word daiin (daiin)

Obviously, one word has to be the most frequent, so it makes no difference which particular word it is.

However, now there are two conflicting hypotheses:
1) This word is the most frequent because it is a representation of some very frequent element (not further specified) of an encoded source text
2) This word is the most frequent because it most naturally appears by the auto-copy hypothesis, i.e. by making small modifications of earlier words.

This is something that can be tested.

I have looked at a transliteration of each individual page of the MS, and compared the first occurrence of 'daiin' with the preceding text. (This is something I did several months ago, but I had not yet reported on it). The complete list is included
in a spoiler tag below.

The page is indicated by a file name starting with a two-character code, that some people may recognise, many not, but that is not too relevant.
Then there is a number, which indicates the appearance of the word on this page. If it says 20, it means that the first occurrence of daiin is the 20-th word on this page.

The last two columns show the Levenshtein distance to the most similar previous word on this page, and what this word was. Note that all distances were computed based on Eva.

Let me first include a graph of the frequency by Levenshtein distance.

[attachment=4727]

daiin is a 5-character word (in Eva) and in about 1/3 of the cases, the Levenshtein distance is 3 or higher, meaning that the word is mostly different from the one it was supposed to be copied from.

I can only invite people to look through the detailed list.
When I do this, I can only conclude that in a large number of cases, it is not reasonable that the word daiin, the most frequent word in the MS, was copied from a very different word earlier on the page, but it simply appeared because it is a frequent word for another reason.

You are not allowed to view links. Register or Login to view.

For discussing a hypothesis it would be necessary to look into the details of my argumentation in order to point to concrete mistakes. Only this way it would be possible to correct the mistakes and too improve this way the knowledge about the Voynich manuscript. But René Zandbergen only wants to deny what is unthinkable in his eyes. To make it less obvious that he didn't go into the details of our argumentation René falsely writes that 'the details of the algorithm have not been shown'. Not only was the algorithm published in our paper but also the source code of an implementation of the algorithm is available at You are not allowed to view links. Register or Login to view..

Then René Zandbergen argues that he thinks that a text generated by the self-citation method must look different. For demonstrating his claim he comes up with his own version of the self-citation method. He assumes that someone copying words would solely add 'small modifications'. This assumption does not refer to a statement I had made in one of my papers! I also have pointed out multiple You are not allowed to view links. Register or Login to view. that such an assumption is simply wrong. I can only encourage everybody interested in our argumentation to read our paper(s) and to check yourself what we say and why (see Timm & Schinner 2020, p. 9f and Timm 2015, p. 16).

In René Zandbergens truncated version a word can only be generated by adding 'small modifications of earlier words'. In our paper we present three concrete modification rules: 'Replacing one or more glyphs by similar ones', about 'Adding or removing a prefix' and about 'Combining two source words to a create new word' (Timm & Schinner 2020, p. 9f). There is nothing said about 'small modifications' and there is also nothing said about small edit distances. Renè Zandbergen already wrote in an earlier post that it his idea that only 'small changes' could be applied (You are not allowed to view links. Register or Login to view.). Beside the fact that René Zandbergen is referring to his own idea and this way to his own version of the self-citation method he is also unaware of the possibility to 'combine two source words to create a new word' and to split a word generated this way into its parts (see Timm & Schinner 2020, p. 9f and also Timm 2015, p. 16). This way his results suggests that it would be hard to come up with <daiin> for suggested source words like <chodaiin>, <fchodaiin>, or <paiindaiin>. But since the sequence 'daiin' is already part of these words it would be only necessary to copy a subgroup <daiin>.

René further assumes that someone who was creative enough to invent his own text generation method and has also invented his own script would lack the creativity to come up with more than just 'small modifications'. In my eyes this idea is wrong. Anyway the level of repetition in a text generated with René Zandbergens version would be eye catching. In order to make the level of repetition less obvious it would be necessary to add further modifications. So even someone starting with René Zandbergens idea to apply only 'small modifications' would adjust the number of modifications until the generated text is sufficient in his eyes.

Last but not least it is also impossible to copy a word from a folio if this folio is empty. Each time the scribe was starting to write on a new empty folio he had no other option than to copy at least some initial words from previous folio (see Timm & Schinner 2020, p. 11 and You are not allowed to view links. Register or Login to view., p. 16). René Zandbergen doesn't take this fact into account. His own statistics show that if <daiin> is within the first 10 word tokens only in rare instances a similar word token can be found. René Zandbergen lists 42 folios where <daiin> is one of the first 10 word tokens. The average edit distance computed on EVA for this 42 folios is 3.9 whereas the average for all other 163 folios is 1.7. Therefore these 42 folios where <daiin> is one of the first 10 word tokens shouldn't be counted. Instead of 81 % now 92 % of the instances of <daiin> are only three or less modifications away from a previous word. And instead of 74 % now 84 % of <daiin> are only one or two modifications away from a previous word. (93 folios or 57 % with 1 modification, 42 folios or 26 % with 2 modifications, 15 folios or 9 % with 3 modifications, 11 folios or 7 % with 4 modifications and 2 folios or 1 % with 5 modifications).

The statistics for <daiin> presented by René Zandbergen doesn't contradict anything we write in our paper. On the contrary also Renés calculations demonstrate that <daiin> occurs together with similar tokens on the same folios. We describe this observation as context-dependent self-similarity (Timm & Schinner 2020, p. 3). There are ten folios (f1r, f7r, f10v, f14r, f32r, f32v, f35v, f37v, f38v, and f45r) where the words <daiin> and <dain> are two out of three among the most frequent tokens. But there are also folios where <daiin> is used together with <aiin> (f41v, f46r, f55v, f89v2, v105v and f114r) or <saiin> (f2r, f16r, and f90r2). There is also a systematic difference between the two Currier languages: in Currier A the words <daiin> and <dain> are used frequently together on the same folios whereas in Currier B <daiin> is more frequently used together with <aiin> (Timm & Schinner 2020, p. 3f). It is noteworthy that René Zandbergen refuses to discuss the observations we present in our paper for the word <daiin> or to refer to any other observation we describe in one of our papers (see You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.).

In our paper we describe the Voynich manuscript as context-dependent and self-similar (see Timm & Schinner 2020, p. 3ff). This means that it is possible to describe local variations of repetition. Most times repetition and patterns of local variations are seen as distinct entities of the text. But it is possible to describe glyphs typical for a certain position within a word, line, or within a paragraph as repetitive elements. In the same way it is possible to describe glyph sequences typical for Currier A as well as glyph sequences typical for Currier B (see Timm & Schinner 2020, p. 6). Word tokens does not only occur multiple times they also occur together similar word tokens. Sometimes this is more than obvious like in 'kol chol chol kor chol sho chol' on folio f1r. The general tendency for the Voynich manuscript is that words occur together with similar ones: "when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail" (Timm & Schinner 2020, p. 3).

Local variations and repetition are not only connected to each other they depend on each other. Words sharing the same glyph sequences does not only prefer a certain position within a line, they also have something in common with word tokens sharing the same line, the same paragraph, the same folio, and on a larger scale with word tokens in Currier A and B. With other words the observed patterns are part of a larger system. Typical for this system is that patterns observed on a smaller scale does also repeat on a wider scale. There are glyphs typical for the start/end of a word and there are also glyphs typical for the start/end of a line. There are also glyphs typical for the first line of a paragraph. In the same way it is possible to describe glyph sequences typical for a certain paragraph, folio, quire as well as for Currier A and B. Based on this observations we argue "These results not only cement the hypothesis that the VMS text is an artificial structure rather than natural language, but also provide constraints for the generating algorithm" (Timm & Schinner 2020, p. 8).

I raised a question about this topic in a different thread, and RenegadeHealer helpfully pointed me to this long (in time and volume) thread. I am not qualified to evaluate all of the statistical methods that Torsten, Rene, and Marco discuss: While I have plenty of advanced mathematical training, it was focused in areas of mathematics other than statistics, unfortunately. But I believe I can make a general comment and observation, perhaps a meta-observation, about the discussion:

Setting aside the details of the statistical points and arguments that Torsten, Rene, and Marco are raising, I observe one other major difference between Torsten's claim and Rene and Marco's arguments: Torsten claims that he has made THE discovery that shows the world how the Voynich manuscript text was actually composed. Rene and Marco are making arguments of their own, but they are not claiming to have THE solution to this question about the ms. This is a BIG difference.

If Torsten were to succeed in persuading the scholarly community (however that may be defined in relation to the enigmatic subject of the Voynich ms) to agree with him and adopt a consensus view that his theory of the method of creation of the Voynich ms text is correct, that would be a BIG DEAL in the academic and scholarly world. It would get a lot of attention in the press globally, and for example the Wikipedia page on the Voynich ms would then state that it is a meaningless text composed in a particular manner, as first demonstrated by Torsten Timm, et al.

Torsten's claim thus rises to the level of significance of any other claim to have deciphered the Voynich ms. It is a claim to have THE solution to understanding the whole manuscript. As Rene has said in the context of discussing one such previous claim that I made, "Extraordinary claims require extraordinary evidence." In other words, the burden of proof is on Torsten, much more so than it is on Rene, Marco, or others, because Torsten is the one who is making the more extraordinary claim.

Perhaps all of the above observations are or should be self-evident to those who are participating in and following this discussion. But I thought it might be useful to add these comments here at this stage of the discussion, to give some context and perspective to readers who are trying to make sense of the intense debate that has been carried on between Torsten and other participants in this thread.

It's an interesting philosophical question. As has been said before, for Torsten to prove his claim is impossible, since it is like proving a negative. I cannot prove that there is no ghost standing behind you.

What Torsten can do, of course, is demonstrate the his method is possible.

It can also become more acceptable in a number of ways. If documents are found that show a similar process being used or devised. Moreover, one might say that a "nonsense" solution becomes more attractive whenever other options fail.

Torsten

MarcoP

ReneZ

ReneZ

ReneZ

Torsten

ReneZ

Torsten

geoffreycaveney

Koen G