The Voynich Ninja

Full Version: A key to understand the VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Quote:Which graphing method are you using in gephi?

Hello Tom,

 first I use You are not allowed to view links. Register or Login to view. to order the nodes then I use Fruchterman-Reingold to distribute the nodes within a circle.
Quote:2. My knowledge of different language falls short here, but there are languages other than Indo-European ones which approach the relevant statistics of Voynichese to a much greater extent.

Hello Koen,

 one key feature of language are repetitive phrases. They did not exist for the VMS (see You are not allowed to view links. Register or Login to view.).

Quote:3. And many of these similar words are very frequent as well, like he, she, we.

My statement was that as more frequent a word is as more similar words exists for it. Your statement that similar words also exists in languages is a different statement. 

Quote:4. They may be lists of dialectal variations of plant names, for example, or names in related languages.

My statement was for the whole manuscript  (see You are not allowed to view links. Register or Login to view.). Page You are not allowed to view links. Register or Login to view. was only an example.

Quote:5. Someone defending the natural language hypothesis might say that they are literally different languages or different dialects of the same language.

Between two languages or dialects I would expect a break and not a smooth development from one language to the other (see You are not allowed to view links. Register or Login to view.).

Quote:6. This is again a problem in and of itself. It must be taken into account by all theories.

For the autocopying hypotheses such a feature is expected. For the line on a page there is no previous line to copy from. Therefore it is necessary to use another page as source. The source word for the first word in each line could only be found within the previous lines. Since the first and the last word in each line are easy to spot, the most obvious way is to pick them as a source for the generation of a word at the beginning or at the end of a line. Therefore features occurring at the end or at the beginning of a line are copied more often.

Moreover, it is to be expected that the scribe would run out of space at the end of a line. Therefore it would be no surprise if sometimes the last letters in a line were squeezed into the available space. However, there are no such crowded places in the VMS. If the scribe was able to select words which fitted into the available space such a feature is no surprise anymore.

Quote:7. And there's often repetition, for example to express plurals.

My statement was about unique words and not about repetition.

Quote:8. We're not as certain about the lack of corrections as we once were.

Indeed some later changes exists. What I mean is the lack of places where something was deleted. 

If your intention is to hide that only similar words are copied all the time, it is a mistake to repeat something. The easiest way to remove something repeated is to change it. One feature of the script used for the VMS is that in many cases one additional quill stroke is enough to change a glyph into another one. For instance, it would easily be possible to change "ch" into "Sh" or "e" into "s". (You are not allowed to view links. Register or Login to view.)
@Torsten
Quote:I use Fruchterman-Reingold to distribute the nodes within a circle.
How do I choose the circle? Please thanks!
Tom please stay on-topic. Ask for IT help elsewhere, such as in questions to experts or via PM to Torstein.
I see in the (interesting) work of Torsten four aspects. In the discussion these are occasionally mixed, or different people prefer to concentrate on different aspects. It may therefore be useful to identify the four clearly.

1) There is the observation that words in the Voynich MS tend to be very similar. The large majority of words form a big family where all words in this family can be converted to other words by a single change (addition, deletion or modification of 1 character). The number of changes needed to go from one word to any other is called the edit distance, defined using the Eva transcription alphabet.

2) There is another observation that words with an edit distance of 1 tend to cluster together in the MS.

3) There is the question whether the above features are the result of an intentional design, or only a by-product of the manner in which the text was generated.

Here we have a minor ambiguity. Points 1 and 2 concern different aspects of the text, and point 3 may refer to one or the other, or to both. The 'auto-copying hypothesis', as far as I understand it, argues that both 1 and 2 are intentional.

4) There is the question whether the text is meaningful or not.

The auto-copying hypothesis would also seem to argue that the text is meaningless.

I think that it is important to separate this highly contentious question from the observations on which it is based. This tentative conclusion may be wrong, but there is certainly something of value in the observations. So let's look at them again.

Point 1 concerns the vocabulary of the MS. There seems to be a rule how words can be formed.

Point 2 concerns the way these words are combined in the text. Basically a rule about syntax.

Both 1 and 2 are observations, that can be verified by statistics. Points 3 and 4 are quite different, in that they represent and interpretation, or possible explanation of either or both.

Point 1 is not too controversial. The interesting plots (and the modified version by Marco) basically show this 'family' of words in a big cloud. At the same time, the argument of Emma presented above is important. The changes related to edit distance 1 are not arbitrary. They follow relatively strict rules.

Point 2 is already more tricky. Anyone who has looked at the text intensively (especially anyone who has transcribed the text) will have noticed this, so it is there. But how much? To what extent? To the extent that it is unusual?
Torsten has shown some clear examples, so again, it is there.
But what if one takes an arbitrary page. Can one establish some kind of 'score' for the use of similar words?

The following thought experiment may illustrate why points 1 and 2 are very different.
Just imagine that someone has a plain text with less than 10,000 different words. He sorts these words according to decreasing frequency, and represents the most frequent word by 1, the second most frequent by 2, etc. to 9999.
If this plain text is then converted word by word to these numbers, we end up with a new text that exactly fits the point 1. All words can be converted to other words by a single change (edit distance 1).

However, the objection of Emma does not apply in this case. Changes can be made arbitrarily, at any point. To solve that, let's now assume that he does not write 1,2,3, .... 9999 but uses roman numerals instead: I, II, III ... Again, the edit distance 1 is preserved throughout the entire vocabulary, but in addition, changes are subject to very specific rules.

However, for both cases, there is nothing at all that would explain the point 2: collocation of similar words. 

What this means is:
Whether point 1 is a correct observation or not does not say anything about whether the text is meaningful or not.
The critical point for this question is only point 2.
The reason why points 1 and 2 should occur (if they do) is not yet clear, but worth exploring further.
Hello Rene,

 thank you for your productive response. 

Quote:The 'auto-copying hypothesis', as far as I understand it, argues that both 1 and 2 are intentional.

 I argue that 1 is unintended side effect of 2. Since the words are copied from each other no other outcome is possible. If the words are copied it doesn't matter how many glyphs where changed while copying a word. In fact most times the scribe has changed two or more glyphs or he has used parts of two or more words to generate a new word. But as more complex the ruleset for generating a word is as more unlikely it becomes that it is possible to generate this word again. Therefore such a ruleset would probably result in  a word used only once (hapax legomena). On the other side as more simple a rule or feature is as more simple it is to repeat it. And since the words are copied again and again each copied word increases the change that this feature was copied again. This is the reason that words with many similar ones are frequent and words for which no similar ones occur only once.

Quote:The auto-copying hypothesis would also seem to argue that the text is meaningless. 

It is not required that the VMS is meaningless for the autocopy hypothesis. In fact even for the autocopy hypothesis it is possible that the VMS contains meaning. For instance with the You are not allowed to view links. Register or Login to view. it would be possible to use autocopied words to transport information. For such a cipher it is for instance thinkable that a word could stand for a plain text letter.

There are other arguments speaking against meaning for the VMS. The first feature is the weak word order. Even if one word stands for a letter it is expected that repeated plain text words would result in repeated sequences of VMS-words. But such sequences are missing for the VMS. For a text using language the words should be used because of their meaning, and relations between words should be expressed by grammatical rules. The only relation I was able to found for the VMS is that similarly spelled words are used near to each other. If a words in VMS is only written because it is similar to a previously written one this speaks against meaning in my eyes.

Another argument is that the autocopying method is only efficient for generating a text if this text has no meaning. To copy a word and to add some meaning to it is less efficient and not fail save.

Last but not least the author of the VMS has done some effort to hide that the words are copied from each other. This indicates that he wanted to hide that the VMS is full of similar words only. 

Quote:However, the objection of Emma does not apply in this case. Changes can be made arbitrarily, at any point.

Yes, changes can be made be arbitrarily, but it is only possible to copy one word at a time. It is an error to assume that all changes can be copied at the same time. Every decision to copy a particular word has some influence to the text. This is the case since every word is at the same time the result of the copying process and it is also a possible source for further copying steps. Therefore it is an error to assume that all the changes you are able to think of must exist in the VMS. During writing the VMS the scribe generated a network of 6837 similar words. To argue that he has missed to add the words 6838 and 6839 which you would add to the network is not a valid argument in my eyes.
(03-02-2017, 09:52 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.For me the text generation method was coming obvious while looking on rare words. See for instance the word "lkl". This word exists only 9 times for the VMS. But this doesn't mean that this word is equally distributed. In fact there is one page with three instances of "lkl". This page is You are not allowed to view links. Register or Login to view. and since three instances of "lkl" can be found on this page it was three times available as source for generating other words. This is the reason that on page You are not allowed to view links. Register or Login to view. also the words "lklor", "kl" and "lkol" exists. Beside "lkl" only 6 other words contain the sequence "lkl". One of this six words is the word "lklor" on page f105v. This is not a coincidence! See "lkl" / "talkl" on page You are not allowed to view links. Register or Login to view. and "lklcheol" / "lkl" on page f115v.

This seems to be a normal property of meaningful texts.  Here's a post by Jorge Stolfi where he looks for patterns like this in War of the Worlds:

You are not allowed to view links. Register or Login to view.

Quote:Along the same vein, these words occur exactly twice, at most 20
tokens apart:

VMS:

   FIRST    LAST    SPAN WORD
   -----   -----   ----- ---------
     644     650       6 damo
    6713    6731      18 olchdaiin
   16965   16969       4 cheteey
   27756   27757       1 qoekol
   36106   36116      10 otaraiin
   36462   36465       3 chtl

WotW:

   FIRST    LAST    SPAN WORD
   -----   -----   ----- ---------
     789     791       2 generation
    5156    5159       3 a~screwin'
    6428    6430       2 joint
    6531    6538       7 ugly
    6532    6539       7 brutes
    6914    6915       1 flutter
    8207    8214       7 novelty
   11097   11104       7 fringe
   15267   15268       1 aloo
   17366   17370       4 shield
   18391   18403      12 streamed
   18414   18422       8 pillars
   23663   23665       2 rows
   29884   29897      13 uppermost
   31994   31997       3 losing
   33642   33655      13 deepened
   34090   34094       4 girls
   35385   35399      14 ellen
   35568   35571       3 garrick
   36993   36997       4 stampede
   37729   37738       9 midland


Note that these include the successive repeats "qoekol qoekol",
"flutter flutter" and "aloo aloo".

As before, the scrambled files had no hits --- i.e. all words with two
occurrences were spread wider than 20 tokens.

In conclusion, both books contain many words that are confined
to specific sections, far more than expected by chance.
Of course this does not prove anything, but is yet another
constraint on proposed theories.

Surprisingly, the WotW has more "lumpy" words than the VMS. This may
be due to the fact that different sections are partly mixed in the
VMS.  I need to find a metric of "lumpiness" that degrades more
graciously with such block-scrambling.
Hello Sam

Quote:This seems to be a normal property of meaningful texts.

The same argument was used by Montemurro and Zanette in the paper You are not allowed to view links. Register or Login to view.. Montemurro and Zanette also argue with respect to the context dependency: "Words that are related by their semantic contents tend to co-occur along the text." [You are not allowed to view links. Register or Login to view.]. In this paper they come to the conclusion that this observation "suggests the presence of a genuine linguistic structure" for the VMS. Unfortunately, the level of context dependency for the VMS is on a higher level than expected for a linguistic system. Therefore, context dependency alone is not enough to allow the conclusion that the patterns found must be the result of a genuine linguistic structure [see You are not allowed to view links. Register or Login to view.].

In fact co-occurence is normal for a You are not allowed to view links. Register or Login to view. [see You are not allowed to view links. Register or Login to view.]. Therefore this property only demonstrates that the VMS is also using a self referencing system. Also the autocopying hypotheses results in a self referencing system: "The connection between consecutive lines and between similar glyph groups exists because the text is a copy of itself. The statistical features of the text can be explained by the hypothesis that the author of the VMS was using the described self-referencing system to generate the text. This text generation mechanism also explains the observation that for common glyph groups almost all spelling variations occur. The use of different spelling variations is no coincidence, because the scribe was generating the text by varying glyph groups already written." [You are not allowed to view links. Register or Login to view.]
I want to point out to Torsten and ReneZ what I found regarding the first 2 letters of how most voynich vords could be behaving.  Is it possible that that the first 2 letters indicate syntax as whether a vord will be a verb, noun, adjective or conjunction?  There is a pattern here where many voynich vords retain the same first 2 letters.


ykaiin   (45) okaiin   (212) qokaiin   (262) qotaiin   (79) otaiin   (154) ytaiin   (43) yaiin   ( 6)
ykain    (10) okain    (144) qokain    (279) qotain    (64) otain    ( 96) ytain    (13) yain    (--)
ykair    ( 8) okair    ( 22) qokair    ( 17) qotair    ( 6) otair    ( 21) ytair    ( 3) yair    ( 2)
ykar     (36) okar     (129) qokar     (152) qotar     (63) otar     (141) ytar     (26) yar     ( 2)
ykal     (16) okal     (138) qokal     (191) qotal     (59) otal     (143) ytal     (19) yal     ( 1)
ykam     ( 5) okam     ( 26) qokam     ( 25) qotam     (12) otam     ( 47) ytam     (13) yam     (--)
ykor     (10) okor     ( 34) qokor     ( 36) qotor     (29) otor     ( 46) ytor     (14) yor     ( 2)
ykol     (14) okol     ( 82) qokol     (104) qotol     (47) otol     ( 86) ytol     (12) yol     ( 2)
yky      (18) oky      (102) qoky      (147) qoty      (87) oty      (115) yty      (24) yy      ( 1) 
ykey     ( 8) okey     ( 63) qokey     (107) qotey     (24) otey     ( 57) ytey     (13) ychey   (17)
ykeey    (58) okeey    (177) qokeey    (308) qoteey    (42) oteey    (140) yteey    (28) ycheey  (24)
ykchy    (22) okchy    ( 39) qokchy    ( 69) qotchy    (63) otchy    ( 48) ytchy    (19) ychy    ( 4)
ykshy    ( 2) okshy    ( 19) qokshy    ( 10) qotshy    ( 5) otshy    (  4) ytshy    ( 3) yshy    ( 1)
       
ykedaiin ( 1) okedaiin (  3) qokedaiin (  3) qotedaiin ( 3) otedaiin (  3) ytedaiin (--) ychaiin (--) ochaiin ( 1)
ykedain  (--) okedain  (  3) qokedain  (  4) qotedain  ( 1) otedain  (  2) ytedain  (--) ychain  ( 4) ochain  (--)
ykedar   ( 1) okedar   (  6) qokedar   (  8) qotedar   ( 3) otedar   ( 11) ytedar   ( 3) ychar   ( 2) ochar   ( 2)
ykedal   (--) okedal   (  7) qokedal   (  3) qotedal   ( 3) otedal   (  4) ytedal   (--) ychal   (--) ochal   (--)
ykedam   (--) okedam   (  3) qokedam   (  3) qotedam   (--) otedam   (  3) ytedam   (--) ycham   ( 1) ocham   (--)
ykedor   ( 1) okedor   (  3) qokedor   (---) qotedor   ( 2) otedor   (  1) ytedor   (--) ychor   (16) ochor   ( 6)
ykedol   (--) okedol   (---) qokedol   (  1) qotedol   ( 1) otedol   (  3) ytedol   (--) ychol   (12) ochol   ( 5)
ykedy    (23) okedy    (118) qokedy    (272) qotedy    (91) otedy    (155) ytedy    (24) ychedy  (13) ochedy  ( 8)
ykeedy   (30) okeedy   (105) qokeedy   (305) qoteedy   (74) oteedy   (100) yteedy   (28) ycheedy ( 7) ocheedy ( 1)
ykchdy   ( 8) okchdy   ( 21) qokchdy   ( 56) qotchdy   (23) otchdy   ( 39) ytchdy   (10) ychdy   ( 2) ochdy   ( 1)
ykshdy   (--) okshdy   (  1) qokshdy   (  4) qotshdy   ( 3) otshdy   (  3) ytshdy   (--) yshdy   (--) oshdy   ( 1)
(04-02-2017, 07:19 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Unfortunately, the level of context dependency for the VMS is on a higher level than expected for a linguistic system. Therefore, context dependency alone is not enough to allow the conclusion that the patterns found must be the result of a genuine linguistic structure [see You are not allowed to view links. Register or Login to view.].

I don't see any basis for this claim.

Quote:In fact co-occurence is normal for a You are not allowed to view links. Register or Login to view. [see You are not allowed to view links. Register or Login to view.]. Therefore this property only demonstrates that the VMS is also using a self referencing system. Also the autocopying hypotheses results in a self referencing system: "The connection between consecutive lines and between similar glyph groups exists because the text is a copy of itself. The statistical features of the text can be explained by the hypothesis that the author of the VMS was using the described self-referencing system to generate the text. This text generation mechanism also explains the observation that for common glyph groups almost all spelling variations occur. The use of different spelling variations is no coincidence, because the scribe was generating the text by varying glyph groups already written." [You are not allowed to view links. Register or Login to view.]

Have you produced any text that has this property?  I don't see why auto-copying should produce it.

Basically, all of your arguments boil down to identifying some property of the VMS text and then:

1) Asserting, without any evidence, that it cannot be a property of a natural language text

2) Asserting, without any evidence, that auto-copying would produce that property
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20