The Voynich Ninja

Full Version: Discussion of "A possible generating algorithm of the Voynich manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.One of my main problems with the method is that it is not very well formulated, and this does not allow testing / verifying it. Just an example: If a recipe for a cake gives a list of ingredients (eggs, flour, etc) and then says: "put all ingredients together and apply heat", then the recipe is too unspecific.


The main problem in formulating an answer to your concerns is that it would at least require to know your concerns

Just an example: in your last paper you write about asking the right question: "The standard question about the Voynich MS is: 'What does it say?' This question may not have an answer. It may not say anything. ... The right question should be: 'How was it done?', because this question definitely has an answer. It was most certainly ‘done’ one way or another, also if the text is meaningless" (You are not allowed to view links. Register or Login to view.). But if it comes to research asking "How it was done?" you only write: "Examples of people who are doing very different things are Rugg (2004) and Timm and Schinner (2019)" (You are not allowed to view links. Register or Login to view.).

(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.What I tried to get from re-reading is how specific the 'prescription' of the auto-copying is. In the most recent paper, there are suggestions about the type of changes applied, but so far I could not get anything on 'how far back' the author/scribe was looking. I have not looked into the code of the application, but there must clearly be some assumptions in there.


On page 10 in Timm & Schinner 2019 we write:
"As for the actual selection process of source words, it is clear from the results of Section 2 (as well as simply suggested by the scribe’s convenience) that they are to be chosen at least from the same page. Because it is handy to copy a word from the same position some lines above (see Timm 2014, p. 18), our implementation of the algorithm includes a mechanism that selects (with a given probability) even tokens from the previous line at the same writing position." (You are not allowed to view links. Register or Login to view., p.10).

(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.What also seems to be missing is the initialisation. How was it started? This may seem a trivial detail, but again there must be something in the code for that.


On page 10 in Timm & Schinner 2019 we write:
"When starting to write on an empty page, it is necessary to choose some initial words from another one, in order to initialize the algorithm. 
Footnote: There was a similar problem for the author of the VMS every time he/she was starting a new (empty) page. In such a case it was probably useful to use another page as source. There is some evidence that the scribe preferred the last completed sheet for this purpose (see Timm 2014, p. 16)." (You are not allowed to view links. Register or Login to view., p.10).

(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.It is not written down specifically (and it is again something that I wanted to double-check) but it seems to be implied that every word in the MS (after the initialisation) is the result of auto-copying. That is, there are no words that are 'new seeds' or incidental re-initialisations.
(I have asked earlier in this thread about this, but I think that this question was understood in a different way).
In any case, these points clarify that the initialisation procedure is too important just not to mention.


I have answered your question You are not allowed to view links. Register or Login to view.. 

It seems that you start with the idea that it is possible to distinguish between a creative initialization phase and a static copying phase and that this way there is something like a fixed starting set of seed words and copying rules. This is the way a computer program would work but it is a misunderstanding of the way a human mind would execute such a method.

First, after a word is written down it becomes a potential source for generating new words. This way every word token has some impact on the text generation method or with your words every written word also re-initializes the method a bit.

Secondly, keep in mind that "the VMS was created by a human writer who had complete freedom to vary some details of the generating algorithm on the spur of a moment" (Timm & Schinner 2019, p. 19). It was always possible for the scribe to add new ideas to the text generation process. For instance he added new glyph shapes or tried new ways to generate a word. For a complete set of modification rules it would be necessary to reconstruct every thought the scribe has had some 500 years ago.

Just some examples:
  • The 'x'-glyph only occurs on certain pages (see You are not allowed to view links. Register or Login to view., p. 13).
  • Some ways to modify words are used frequently, others change over time and some are used only a limited number of times. See for instance the word <chey>. In Currier A it occurs mainly beside words like <shey> or <chy> and in Currier B beside words like <chedy>, <shey> and <shedy> (see You are not allowed to view links. Register or Login to view.). 
  • Or see the repetition pattern for 'okeoke' and 'oteote'. There are only four pages within the Astronomical and Zodiac section using such glyph sequences (see You are not allowed to view links. Register or Login to view., p. 9):
      <f68v1.C.2> okey ... okeokey
      <f70v2.R3.1> okey ... oteotey ... oteoteotsho
      <f71r.R1.1> oky ... okeoky oteody ... okeokeokeody
      <f72v2.R2.1> otey ... oteotey oteoldy

(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Then, if this assumption (no new seeds) is true, one could verify the auto-copying hypothesis by checking for each word in the MS if there is a recent (how far back?) similar word (which max. edit distance?) from which it could be derived. 


It would also be necessary to consider modification rule three (see Timm & Schinner 2019, p. 10). Rule three is about combining two source words. If the scribe used <ol> and <chedy> to generate words like <olchedy> it simply doesn't make much sense to calculate the edit distance between <ol> and <olchedy> (see also the answer given You are not allowed to view links. Register or Login to view.).

(23-10-2019, 05:29 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This seems to be the most basic test of the method, but I remember no evidence that this was even attempted. Again, something I would still need to check in the earlier papers. They are very long....


Please see chapter 3 "Evidence" in Timm 2014. In this chapter I describe this type of test for two control samples: "In this paper all words occurring seven times and all words occurring eight times are used as two separate control samples. ..." (You are not allowed to view links. Register or Login to view., p. 12ff)

In Timm & Schinner 2019 we also write about a test for the whole VMS. For this test we check if it is possible to describe the text as a network of similar word. The outcome of this test was:

"How does this situation change when we look at the entire VMS? Figure 2 shows the resulting network, connecting 6,796 out of 8,026 words (= 84.67%). Again, an edge indicates that two words differ by just one glyph. The longest path within this network has a length of 21 steps, substantiating its surprisingly high connectivity" (Timm & Schinner 2019, p. 5).

If it comes to 'isolated' words within the network the result is as follows:
"The respective frequency counts confirm the general principle: high-frequency tokens also tend to have high numbers of similar words. This is illustrated in greater detail in Figure 3: 'isolated' words (i.e., unconnected nodes in the graph) usually appear just once in the entire VMS, while the most frequent token <daiin> (836 occurrences) has 36 counterparts with edit distance 1. Note that most of these 'isolated' words can be seen as concatenations of more frequent words (e.g. <polcheolkain> = <pol> + <cheol> + <kain>)" (Timm & Schinner 2019, p. 5).

For the whole manuscript only 229 word types exists (229 out of 8027 types = 2.85 %) which differ in more then two glyphs to all other word types occurring in the VMS (see You are not allowed to view links. Register or Login to view.). Two typical words of this kind are <okeokeokeody> and <okeeolkcheey>. All 229 types occur only once and it is possible to split them into two or more words also occurring in the VMS. It is for instance possible to split the word <okeo keo keody> into three and the word <okeeol kcheey> into two words. With other words I was unable to find a single word that can't be explained by the 'self-citation' hypothesis.
Thanks Torsten, that it quite helpful.

The pseudo-code suggests that all paragraph-initial lines are generated from earlier paragraph-initial lines and all other lines are generated from earlier text on the same page.

This would make the proposed test: how many words actually comply with this, quite feasible to execute.
One could check how much edit distance is needed to explain the Voynich MS text under the auto-copy hypothesis.
Of course, one has to apply reason to interpret such results. If only 5% of words don't fit, for the case of a small edit distance, then this should not be considered a show-stopper. Hoever, we have not seen such an analysis yet.

The 'evidence' of the network of words does not really support the auto-copy hypothesis, because the existence of such a network is a property of the vocabulary, not of the text.

The existence of word patterns (which helps to explain the network of words) is also a property of the vocabulary, not of the text.

The difference between vocabulary and text is critical.
It is two-fold:
1) the ordering of the words
2) the frequency distribution of the words

If one takes a text and shuffles all words around arbitrarily, the first point is changed, the second is not changed, the text becomes meaningless but the vocabulary stays the same.

The network of words also stays the same. The existence of this network of words is therefore completely independent of the question whether a text is meaningful or not.

However, the existence of such a network is a necessary condition for the auto-copy with small edit distances to work. Since English does not have a network with small edit distances, an English text will not look like the product of an auto-copying process.

However, if one changes the English text by mapping words 1-to-1 with Voynichese words, it *might* look like one.

By re-organising this mapping, it is possible to make it into a text that looks like auto-copying. this was already shown before.
Everything I have now read about word frequency, seek an explanation where it needs no explanation.
Since I have read the books from the 1400 century correctly, it is also clear to me that the word frequency is absolutely normal. From today's point of view one would reduce a whole page to five lines. The result of the text of one page = "2 spoons sugar in tea".
After reading 10 pages, I can already predict 80% of what comes on pages 11 and 12.
This seems a bit weird, but the VM on the 14th century spelling is almost 1 to 1.
(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Since English does not have a network with small edit distances, an English text will not look like the product of an auto-copying process.

However, if one changes the English text by mapping words 1-to-1 with Voynichese words, it *might* look like one.

By re-organising this mapping, it is possible to make it into a text that looks like auto-copying. this was already shown before.

Hi Rene,
would the process you describe also result in occurrences of exact reduplication (like daiin daiin, qokedy qokedy qokedy etc)? If so, could you please detail how this is obtained, or point me to where it was discussed?
Hi Marco,

this discussion is based on the assumption that the Voynich MS 'words' are really words.
So, the full repeats are assumed to be full repeats of words, and in that case they must already exist as repeats in the source text. They would not be introduced in the process I was describing.

Of course, in a meaningless text, the 'words' are not words - they are nothing :-)
(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The pseudo-code suggests that all paragraph-initial lines are generated from earlier paragraph-initial lines and all other lines are generated from earlier text on the same page.
This would make the proposed test: how many words actually comply with this, quite feasible to execute.

I have already answered this point in my last post. Anyway, even if you rephrase the "it must work like a computer simulation" argument, it is still wrong (see You are not allowed to view links. Register or Login to view.). We don't argue that the text was created by a computer program and we also don't argue that our program is able to simulate the complexity of human behavior.

See page 11 in Timm & Schinner 2019:
"However, note that any algorithmic description of the VMS must be seen as just one of many possible realizations. After all, the original VMS was not created by a computer program; the scribe had complete freedom to implement random personal esthetic preferences, spontaneous impulses, or even idiosyncrasies. The scope of this work is not the 'elemental deconstruction' of the VMS to an exact (and complete) set of rules. We rather demonstrate the feasibility to algorithmically create a text as rich and complex as the VMS, using the strikingly simple self-citation method (You are not allowed to view links. Register or Login to view., p. 11).


(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.One could check how much edit distance is needed to explain the Voynich MS text under the auto-copy hypothesis. 
Of course, one has to apply reason to interpret such results. If only 5% of words don't fit, for the case of a small edit distance, then this should not be considered a show-stopper. Hoever, we have not seen such an analysis yet.
 

I have already given the reference to chapter 3 "Evidence" in Timm 2014. In this chapter I describe this type of test for two control samples: "In this paper all words occurring seven times and all words occurring eight times are used as two separate control samples. ..." (You are not allowed to view links. Register or Login to view., p. 12ff).


(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The 'evidence' of the network of words does not really support the auto-copy hypothesis, because the existence of such a network is a property of the vocabulary, not of the text.

It is one of the basic facts for the VMS that there are local variants. For instance Prescott H. Currier wrote in 1976: "These features are to be found generally in the other Sections of the manuscript although there are always local variations" (You are not allowed to view links. Register or Login to view.). You even write it yourself: "From this figure it seems more likely that the observed feature is page-related" (You are not allowed to view links. Register or Login to view.). 

See page 3 in Timm & Schinner 2019:
"Most interestingly, there appears to even exist an inherent relation between word similarity, with respect to the string edit distance and context: when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail. ... However, all pages containing at least some lines of text do have in common that pairs of frequently used words with high mutual similarity appear." (You are not allowed to view links. Register or Login to view., p. 3).[/font]

Moreover, we have demonstrated a network graph for a single page as well as for the whole text (see Timm & Schinner 2019, p. 4). In both cases the result is the same. This means the text of the VMS is You are not allowed to view links. Register or Login to view.. With other words, it simply doesn't matter if you create a network graph for a single page, a bifolio, a quire in Currier A or B, or the whole VMS. 

See for instance the statics for the example in Figure 1: T[font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]he core network for page [/font][/font][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif]You are not allowed to view links. Register or Login to view.[/font][/font][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif] contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %). [/font][/font]Similar subnets can be constructed for all pages, but, of course, they are more instructive for pages containing lots of text. We even provide network graphs for all pages containing at least some lines in 1.3 "You are not allowed to view links. Register or Login to view." (see You are not allowed to view links. Register or Login to view.).


(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.However, if one changes the English text by mapping words 1-to-1 with Voynichese words, it *might* look like one.
By re-organising this mapping, it is possible to make it into a text that looks like auto-copying. this was already shown before.

I have answered this point You are not allowed to view links. Register or Login to view.: "[font=Tahoma, Verdana, Arial, sans-serif]There is no contradiction between your experiment ...". You even say it yourself: "[/font]I think one can only analyse all proposed solutions by themselves and see if they 'work'" (You are not allowed to view links. Register or Login to view.).

Anyway, if you use a word dictionary substitution cipher to map the words 1-to-1 this would not affect the word frequencies and the word order. It would still be possible to identify high frequency words as functions words and grammar rules for words depending on each other. It is simply not possible to explain facts like that the line behaves as a functional entity or the correlation between frequency and word similarity with a 1-to-1 dictionary cipher. 

See for instance the word frequencies for tokens similar to <qokeedy>, <qotey>, <qokeey>, and <otedy>:
qokey   (107 times) okey   (63)  otey   (57)  qotey   (24)
[font=Courier New]qokeey  (308 times) okeey  (177) oteey  (140) qoteey  (42)[/font]
[font=Courier New]qokeedy (305 times) okeedy (105) oteedy (100) qoteedy (74)[/font]
[font=Courier New][font=Courier New]qokedy  (272 times) okedy  (118otedy  (155qotedy  (91)[/font][/font]

[font=Tahoma, Verdana, Arial, sans-serif]With other words, if similar tokens share the same meaning, sequences like in line f108v.P39 would only repeat the same information multiple times and if [font=Tahoma, Verdana, Arial, sans-serif]similar tokens not share the same meaning you would need a dictionary containing any word token instead of any word type. But this way the dictionary would be as long as the text and it would be sufficient to know the dictionary to reconstruct the source text.[/font][/font]
[font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif][font=Tahoma, Verdana, Arial, sans-serif][font=Courier New]<f108v.P.39> qokeedy qokeedy qokeedy qotey qokeey qokeey otedy[/font][/font][/font][/font]
(25-10-2019, 01:51 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.the full repeats are assumed to be full repeats of words, and in that case they must already exist as repeats in the source text. They would not be introduced in the process I was describing.

Thank you, Rene!
I find it difficult to imagine that Voynichese can be based on word-to-word correspondence with a common European language. The pure frequency of reduplication is something I have never seen in European texts. Also, the words that frequently reduplicate are the most frequent words in the language (see graphs You are not allowed to view links. Register or Login to view.). The hypothetical English text should have several occurrences of each of "the the" "and and" "to to"....

Something I like in Torsten's approach is that he treats exact reduplication and quasi-reduplication as related phenomena: my impression is that some kind of relation must exist. 

Also, I think that in Voynichese similar words often tend to behave similarly and are somehow interchangeable.

For instance (using Takahashi's transcription) the sequence qokaiin okaiin occurs twice, as does its reverse okaiin qokaiin.
Both words appear a few times before otaiin:
okaiin otaiin 7 times
qokaiin otaiin 3 times

All these words also appear in reduplicated form:
qokaiin qokaiin 2 times
okaiin okaiin 4 times
otaiin otaiin 3 times

In English, consecutive words tend to belong to different parts-of-speech, e.g. an adjective and a noun, or a verb and a preposition: one wouldn't expect consecutive words to be interchangeable. If similar words systematically corresponded to consecutive words in an English text, I would expect them to systematically behave differently.
(25-10-2019, 04:03 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The pseudo-code suggests that all paragraph-initial lines are generated from earlier paragraph-initial lines and all other lines are generated from earlier text on the same page.
This would make the proposed test: how many words actually comply with this, quite feasible to execute.

I have already answered this point in my last post. Anyway, even if you rephrase the "it must work like a computer simulation" argument, it is still wrong (see You are not allowed to view links. Register or Login to view.). We don't argue that the text was created by a computer program and we also don't argue that our program is able to simulate the complexity of human behavior.

I'm afraid that that is your over-simplification. I do not believe at all that the Voynich MS text can be simulated by a computer programme because it was definitely generated by a human being some 600 years ago.

What I did write was that you use your App as a key element in the evidence. To quote your paper:

Quote:The text sample investigated throughout this section was created with Algorithm
1. The parameter setup as well as the sample size (approximately 10000 tokens) correspond
to the VMS "Recipes" section (f103r-f116v). Figures 6 and 7 show that this
text reproduces both of Zipf's laws as exactly as the VMS (or any natural language).

and

Quote:The result for the generated text sample can be seen in Figure 8. The self-citation
process provides an excellent qualitative and even quantitative explanation for the
asymptotic deviation (observed in the VMS) from the straight random result F(l) =
l**0.5 (which characterizes natural language). On the other hand, it also reproduces
the close-range structure of the VMS text, i.e. the n-gram entropy values, with high
accuracy.

With respect to:

(25-10-2019, 04:03 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The pseudo-code suggests that all paragraph-initial lines are generated from earlier paragraph-initial lines and all other lines are generated from earlier text on the same page.
This would make the proposed test: how many words actually comply with this, quite feasible to execute.

I have already answered this point in my last post.

... do you mean the proposed test? I consider it the only possible proof that you could be right, and it would not be hard for you to do.

One other point I would like to respond to:
Quote:[font=Tahoma, Verdana, Arial, sans-serif]if similar tokens share the same meaning, [/font]

... I don't see any reason to believe that. This is not the case in any language I know of. Change a single letter and the meaning changes completely. Even in tonal languages, just change the tone and the meaning changes completely.

Again a more fundamental problem with the auto-copying process: everything that I see in the Voynich MS text speaks about planning. There is nothing arbitrary about it.

When thinking about the auto-copy process: of all the single-edit distance variations that would be possible for a word like qokeey , the ones that actually exist, and are used consistently, is only a very small percentage.
Marco wrote:

Quote:The pure frequency of reduplication is something I have never seen in European texts.

I am also not aware of examples. However, I can think of many possible reasons, mostly along the lines that the Voynich MS "words" do not represent words.
(25-10-2019, 05:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.What I did write was that you use your App as a key element in the evidence. 

"The scope of this work is not the 'elemental deconstruction' of the VMS to an exact (and complete) set of rules" (Timm & Schinner 2019, p. 11). With other words, the app is a simulation of the method. It demonstrates that with this method it is possible to generate a text that "reproduces some of the statistical key properties of the Voynich manuscript; in particular, both of Zipf’s laws are fulfilled" (Timm & Schinner 2019, p. 1).


(25-10-2019, 05:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(25-10-2019, 04:03 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(25-10-2019, 08:16 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The pseudo-code suggests that all paragraph-initial lines are generated from earlier paragraph-initial lines and all other lines are generated from earlier text on the same page.
This would make the proposed test: how many words actually comply with this, quite feasible to execute.

I have already answered this point in my last post.

... do you mean the proposed test? I consider it the only possible proof that you could be right, and it would not be hard for you to do.
  • I have executed this type of test in chapter 3 "Evidence" in Timm 2014 for two control samples: "In this paper all words occurring seven times and all words occurring eight times are used as two separate control samples. ..." (Timm 2015, p. 12ff).
  • I have presented the statics for the example in Figure 1: The core network for page You are not allowed to view links. Register or Login to view. contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %). Similar subnets can be constructed for all pages, but, of course, they are more instructive for pages containing lots of text. We even provide network graphs for all pages containing at least some lines in 1.3 "Graphs for individual pages" (see Additional Materials).
  • I have presented the network graph for a single page as well as for the whole text (see Timm & Schinner 2019, p. 4).
  • I have published the network graphs for any page in the VMS. You can just open the gephi projects (You are not allowed to view links. Register or Login to view.) for any page you want and calculate the number of word types and word tokens within the core network with a edit distance of 1: 
  • For instance the core network for page You are not allowed to view links. Register or Login to view. contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %).  
  • The core network for page You are not allowed to view links. Register or Login to view. contains 239 out of 354 words (=67.5 %). They represent 485 token out of 613 tokens (=79.1 %). 
  • The core network for page You are not allowed to view links. Register or Login to view. contains 172 out of 267 words (=66.9 %). They represent 305 token out of 394 tokens (=77.4 %).
  • ...
If you can't accept this answers for any reason you should explain yourself. Just to ignore the answers and too loop over your question doesn't help.


(25-10-2019, 05:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Again a more fundamental problem with the auto-copying process: everything that I see in the Voynich MS text speaks about planning. There is nothing arbitrary about it.

A word generated by self-citation is at the same time the result of the text generation method as well as a possible source for generating new words. Therefore the text is in the first place repetitive. That you interpret repetition as carefully planned is just an overinterpretation on your side.


(25-10-2019, 05:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.... I don't see any reason to believe that. This is not the case in any language I know of. Change a single letter and the meaning changes completely. Even in tonal languages, just change the tone and the meaning changes completely. 

You just missed the point. Nobody has argued that a single letter can not change the meaning of a word in a natural language. Anyway, in your last post you have now argued yourself against your idea that "if one changes the English text by mapping words 1-to-1 with Voynichese words, it *might* look like one." (You are not allowed to view links. Register or Login to view.).
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25