The Voynich Ninja

Full Version: A possible generating algorithm of the Voynich manuscript
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
(30-05-2019, 09:17 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 08:29 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.It would also be possible to produce some key statistics like:
which percentage of words is a modification with edit distance 1 of a recent word, for different definitions of 'recent'. 

This is explained on page 5: "Figure 2 shows the resulting network, connecting 6,796 out of 8,026 words (=84.67%)."

(30-05-2019, 08:29 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This is a key value. If this percentage is high, say 80%, then the theory is clearly describing a relevant fraction of the text. If it is low, say 20%, then the theory is *not* describing a relevant fraction of the text. In fact, the vast majority of the text would remain unexplained by the theory.

There is only one giant network connecting all frequently used word types. Only You are not allowed to view links. Register or Login to view. (=2.85 %) differ in more then two glyphs to all other word types. They occur only once. Moreover, even for this 229 word types it is usually possible to split them into two or more words also occurring in the VMS. Two words of this kind are <okeokeokeody> and <okeeolkcheey>. It is for instance possible to split this two words into <okeo>+<keo>+<keody> and <okeeol>+<kcheey>. There is simply no word that is not similar to at least one other words. The key value your are asking for is therefore at least 97.15 %.

This is not what I meant. The network plots show the end result (or in fact the starting point), but I am interested in the process.

We don't know if the text of the Voynich MS was based on some source text or is meaningless, but we know for sure that it was 'generated' some 600 years ago. This applies either way. It may have been generated using some random process or it may have been generated by manipulating a text.

Your various papers suggest that we will learn what was the process, but while this is described vaguely (taking recent previous words, and modifying them), the evidence that this happened is not there.
The network graph does not show the process.
(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This is not what I meant. The network plots show the end result (or in fact the starting point), but I am interested in the process.

The process is explained in chapter 3. Text generation: "... the generation of a token from an existing word pool is a two-step process: select a source word and modify it, following specific rules ..."

See also the examples given in my last two posts:
For instance <chedy> is introduced on page f32r. The word before <chedy> is <tchey>. To introduce <chedy> it was only necessary to repeat <chey> and to ad a <d> before <y>. Just click on the link to page You are not allowed to view links. Register or Login to view..

For instance the word <daiiny> occurs only You are not allowed to view links. Register or Login to view.. On first sight it seems unclear if it is a misspelled variant of the more common word <daiin> or a new word. 

Another example is the use of the  <m>-glyph on page You are not allowed to view links. Register or Login to view.. If you search for new words on page You are not allowed to view links. Register or Login to view. you will find for instance the words <sheoldam>, <tsheoarom> and <pcheoldom>. This three words are similar to each other and they occur only once within the VMS. Moreover <tsheoarom> and <pcheoldom> are used as paragraph initial words. There are only seven paragraph initial words using a final <m>-glyph in the whole VMS. This is the way new words occur within the VMS.

(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.We don't know if the text of the Voynich MS was based on some source text or is meaningless, but we know for sure that it was 'generated' some 600 years ago. This applies either way. It may have been generated using some random process or it may have been generated by manipulating a text.

Your various papers suggest that we will learn what was the process, but while this is described vaguely (taking recent previous words, and modifying them)

This is a concrete description: Use your eyes to select a token or parts of tokens and modify them. It's just as simple as that. 

(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view., the evidence that this happened is not there.

The evidence is that this simply method is all you need to explain the VMS. See chapter 2: "...  not only frequency and similarity of tokens are correlated, but also similarity and relative position". 

See also You are not allowed to view links. Register or Login to view. chapter 7. "The text generation method"

See also section You are not allowed to view links. Register or Login to view. "Graphs for individual pages" on "You are not allowed to view links. Register or Login to view.". You can convince yourself that always tokens with high structural similarity appear preferably in close vicinity of each other. Just click on the links to voynichese.com in section You are not allowed to view links. Register or Login to view..

(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The network graph does not show the process.

The reference to the network graph was the answer to your question "which percentage of words is a modification with edit distance 1 of a recent word, for different definitions of 'recent'."
(31-05-2019, 11:15 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The reference to the network graph was the answer to your question "which percentage of words is a modification with edit distance 1 of a recent word, for different definitions of 'recent'."
It is only relevant if "recent" is defined as "on the same page". For Figure 4 (page 8) is the distance in lines counted across pages in the order that they appear in the (TT) transcription or inside pages only?
(31-05-2019, 01:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(31-05-2019, 11:15 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The reference to the network graph was the answer to your question "which percentage of words is a modification with edit distance 1 of a recent word, for different definitions of 'recent'."
It is only relevant if "recent" is defined as "on the same page". For Figure 4 (page 8) is the distance in lines counted across pages in the order that they appear in the (TT) transcription or inside pages only?

They are counted across pages in the order as they appear in the transcription. It is therefore visible that pages in Currier B (a) contain more text then pages in Currier A (b). It is also visible that words in Currier B are on average longer then in Currier A (see also graph 18 in You are not allowed to view links. Register or Login to view., p. 28). This means that the effect is smaller than otherwise and that it would be possible to improve the significance of the graph. Nevertheless, the graph is already of high significance anyway.

Note: See also graph 2 and 3 in You are not allowed to view links. Register or Login to view., p. 10 and table 2, 3, and 4 in You are not allowed to view links. Register or Login to view., p. 3f.
(31-05-2019, 02:21 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.They are counted across pages in the order as they appear in the transcription. It is therefore visible that pages in Currier B (a) contain more text then pages in Currier A (b). It is also visible that words in Currier B are on average longer then in Currier A (see also graph 18 in You are not allowed to view links. Register or Login to view., p. 28). This means that the effect is smaller than otherwise and that it would be possible to improve the significance of the graph. Nevertheless, the graph is already of high significance anyway.

One plausible way to improve the significance of the observed effect would be to reorder the transcription by bifolio (it will be a test of the hypothesis that the scribe(s) wrote the 4 pages of a bifolio in sequence before starting a new one) and maybe also find the optimal order of bifolios (as there have been several more or less credible speculations that some bifolios were bound in the wrong order.)

Very interesting! Thanks.

Edit: BTW, I have a slightly different number of connected word types at edit distance 1 than the one on page 5 (6,796 out of 8,026 words). I removed all words with unclear glyphs and kept everything else in the You are not allowed to view links. Register or Login to view. of the TT transcription. Did you use a different version or filter? I will check this week-end if the one downloaded from You are not allowed to view links. Register or Login to view. is identical. Is there some special rule for edit distance such as EVA-ii counts as 1 glyph or is it just plain EVA string edit distance?
As a side note, what is the length distribution of vords generated with the proposed algorithm? Does it match what is observed with the VMS (binomial)?
(31-05-2019, 03:22 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.As a side note, what is the length distribution of vords generated with the proposed algorithm? Does it match what is observed with the VMS (binomial)?

1) Since the algorithm is generating new words by modifying existing ones the outcome must be a network of similar words. This also implies that similar words have similar length.

2) See figure 9 on page 15: "The line is a Gaussian fit with mean 5.73 and width 3.02. The corresponding values for the VMS 'Recipes' section are 6.09 and 3.30, respectively."

Did you read the argumentation given in the paper?
Torsten, I have not read your paper but what I like of your analysis is the freedom you give to the scribe. Of course there are a rigid word structure and many regularities. We all can see that, but there is also a certain degree of freedom in the creation of glyphs chains and that's why there are no repeated phrases.
  Your theory reinforces my idea that the script is not a verbal language but a symbolic code. Each glyph is a symbol, something tha we can represent mentally, and the scribe has some freedom to combine the symbols respecting certain rigid rules.
(31-05-2019, 03:22 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.As a side note, what is the length distribution of vords generated with the proposed algorithm? Does it match what is observed with the VMS (binomial)?

Quote: See page 2: "We use the algorithm to create a 'facsimile' of the VMS 'Recipes' section."

For further explanation You are not allowed to view links. Register or Login to view. .
(31-05-2019, 04:58 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(31-05-2019, 03:22 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Did you read the argumentation given in the paper?

No, I haven't. In fact, I haven't read the paper, to begin with.

(31-05-2019, 05:40 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.[quote="Anton" pid='28186' dateline='1559312558']
As a side note, what is the length distribution of vords generated with the proposed algorithm? Does it match what is observed with the VMS (binomial)?

Quote: See page 2: "We use the algorithm to create a 'facsimile' of the VMS 'Recipes' section."

For further explanation You are not allowed to view links. Register or Login to view. .


Thanks, have not seen the deevlopment in that thread yet.
Pages: 1 2 3 4 5 6 7