The Voynich Ninja

Full Version: A possible generating algorithm of the Voynich manuscript
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
If I were specifically studying or writing about cryptanalysis, $135 a year would be worth it, but I'm not, so it's a bit steep to pay that much to read one article.
(30-05-2019, 09:17 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 08:29 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is nothing too concrete on how new words are introduced. Just *that* this happens.

A new word is generated like any other word. It is generated by copying and varying words already written.

For instance <chedy> is introduced on page f32r. The word before <chedy> is <tchey>. To introduce <chedy> it was only necessary to repeat <chey> and to ad a <d> before <y>. 
Just click on the link to page You are not allowed to view links. Register or Login to view..

...

chedy occurs many times on You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. , and You are not allowed to view links. Register or Login to view. , and the tokens directly before it are dissimilar except for two that are somewhat similar. If it occurs enough times, it will eventually follow a similar token.
(02-06-2019, 01:46 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 09:17 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 08:29 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is nothing too concrete on how new words are introduced. Just *that* this happens.

A new word is generated like any other word. It is generated by copying and varying words already written.

For instance <chedy> is introduced on page f32r. The word before <chedy> is <tchey>. To introduce <chedy> it was only necessary to repeat <chey> and to ad a <d> before <y>. 
Just click on the link to page You are not allowed to view links. Register or Login to view..

...

chedy occurs many times on You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. , and You are not allowed to view links. Register or Login to view. , and the tokens directly before it are dissimilar except for two that are somewhat similar. If it occurs enough times, it will eventually follow a similar token.

@JKP Your response suggests that you didn't know any of my papers. I have two open access papers published at arxiv.org (see You are not allowed to view links. Register or Login to view.). Please read them.

1) The order of the pages we see today is not the original order (see You are not allowed to view links. Register or Login to view., p. 23ff).

2) f26r, f26v, and f31r belong to the same bifolio. This bifolio belongs to Currier B whereas page You are not allowed to view links. Register or Login to view. belongs to Currier A. Currier A was written before Currier B (see You are not allowed to view links. Register or Login to view., p. 25). Therefore You are not allowed to view links. Register or Login to view. was written before f26r, f26v, and f31r.

3) There are only two instances of <chedy> in Currier A. The second instance is on page You are not allowed to view links. Register or Login to view.. f89r1 belongs to the pharmaceutical section. Herbal in Currier A was written before the pharmaceutical section. Therefore we can be sure that <chedy> was introduced on page f32r.

4)  "... similar glyph groups can be found above each other twice as often as they can be found side by side ..."  (You are not allowed to view links. Register or Login to view., p. 14). I is not important that the two words follow each other. For instance it would be also necessary to find the source word for <tchey>. This word was probably <qotchy> or <qokchy>in line f32r.P.5 (see You are not allowed to view links. Register or Login to view.). Please note that also <qotchy> and <qokchy> are similar to each other.

5)  I argue that the writer generated new words by copying and modifying already written glyph sequences (see You are not allowed to view links. Register or Login to view., p. 14ff). I't is only important for my argumentation that <tchey>was written before <chedy>.
(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This is not what I meant. The network plots show the end result (or in fact the starting point), but I am interested in the process.

We don't know if the text of the Voynich MS was based on some source text or is meaningless, but we know for sure that it was 'generated' some 600 years ago. This applies either way. It may have been generated using some random process or it may have been generated by manipulating a text.

Your various papers suggest that we will learn what was the process, but while this is described vaguely (taking recent previous words, and modifying them), the evidence that this happened is not there.
The network graph does not show the process.

Sorry, but I didn't know what you mean.

You ask about the introduction of new words? I explain some examples for the introduction of new words. Your response is "There is nothing too concrete on how new words are introduced." What is more concrete than examples?

You ask about key statistics with edit distance 1. I give you such key statistics for the whole VMS. Your response is "This is not what I meant."

What do you mean with process, if you can't accept instructions to generate the text as an answer? Do you ask about the deconstruction of some text to illustrate what the writer was thinking during writing? Do you mean that there must be some planing behind it? Do you mean something else?

(31-05-2019, 10:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.... The network plots show the end result (or in fact the starting point) ...

The network plots did not show the starting point. It is not necessary to plan a network of similar words if you want to write only similar words anyway. The VMs didn't contain something else than similar tokens. This is the reason that if we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail. This fact alone illustrates that the occurrence of similar tokens is typical for the whole VMS. 

It simply doesn't matter if you look on a single page, a bifolio, a quire in Currier A or B, or the whole VMS. The result is always the same. In fact, subnets can be constructed for You are not allowed to view links. Register or Login to view., but, of course, they are more instructive for pages containing lots of text. For instance the core network for page You are not allowed to view links. Register or Login to view. contains 189 out of 277 words (=68.2 %). They represent 402 token out of 491 tokens (=81.8 %). 
The core network for page You are not allowed to view links. Register or Login to view. contains 239 out of 354 words (=67.5 %). They represent 485 token out of 613 tokens (=79.1 %).
The core network for page You are not allowed to view links. Register or Login to view. contains 172 out of 267 words (=66.9 %). They represent 305 token out of 394 tokens (=77.4 %).
...
(31-05-2019, 02:45 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Edit: BTW, I have a slightly different number of connected word types at edit distance 1 than the one on page 5 (6,796 out of 8,026 words). I removed all words with unclear glyphs and kept everything else in the You are not allowed to view links. Register or Login to view. of the TT transcription. Did you use a different version or filter? I will check this week-end if the one downloaded from You are not allowed to view links. Register or Login to view. is identical. Is there some special rule for edit distance such as EVA-ii counts as 1 glyph or is it just plain EVA string edit distance?
After checking and re-checking I found that of 8026 word types, 6930 are at Levenshtein edit distance 1 of another word type, 1096 are not.
(02-06-2019, 04:32 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The core network for page You are not allowed to view links. Register or Login to view. contains 172 out of 267 words (66.9 %). They represent 305 token out of 394 token (=77.4 %).
394 word tokens, yes, because two unclear words are removed. But surely there are 257 word types, not 267?
(02-06-2019, 06:15 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view....


@JKP Your response suggests that you didn't know any of my papers. I have two open access papers published at arxiv.org (see You are not allowed to view links. Register or Login to view.). Please read them.


...

Okay, last night I read the most recent version of the most recent open-access paper.

Much of it I agree with, actually, these are patterns I noticed myself before I created a concordance (it's what prompted me to create a concordance), but... I still need to mull this over because there are other dynamics.
The problem is not with the statistics or the existence of patterns.

The problem is with the suggestion that these patterns are the result of a conscious effort by the author(s) to create these patterns, i.e. the auto-copying hypothesis ('intentional').

These patterns are not so pronounced that explaining them can explain the whole text. It just explains a fraction of the text. The possibility that they are a side effect of some other process is not really addressed ('unintentional'). I am not aware of any evidence that speaks in favour of the intentional option, over the unintentional option.

The network of words with their edit distances only shows the existing words.
One has to imagine that inside this network there is a much denser network of other words, also with edit distance one (1) to the existing words, that do not occur in the text.

Effectively, the network of existing words consists of very specific 'paths' through this denser network, and I find it hard to believe that this network of paths is just the result of chance.

There is far more planning and system in the MS text than this.
I agree with Rene. 

The conscious and manual effort of finding patterns.....do not comply with the intuitive nature of the text in the MS.

Additionally I stated this before, I think the metrics in such a proposition of Torsten can not be discussed on a forum properly, because you will have to know and discuss the detailed words/groups en word edit distances.  This becomes unreadable in an instance here.
(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The problem is not with the statistics or the existence of patterns.

With other words you don't see any mistakes in the facts provided in the You are not allowed to view links. Register or Login to view.?

(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The problem is with the suggestion that these patterns are the result of a conscious effort by the author(s) to create these patterns, i.e. the auto-copying hypothesis ('intentional'). 

It is simply not possible to write something unintentional (without thinking). Keep in mind that the main goal was to write the text as efficient as possible and that therefore only limited effort was taken in hiding these patterns.

(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.These patterns are not so pronounced that explaining them can explain the whole text. It just explains a fraction of the text. 

The You are not allowed to view links. Register or Login to view. and statistics demonstrate that there is nothing else than just similar words. If you think otherwise it should be easy to point to something that is not explainable as local repeated elements.

(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The possibility that they are a side effect of some other process is not really addressed ('unintentional'). I am not aware of any evidence that speaks in favour of the intentional option, over the unintentional option.

You are not only moving the goalposts (see You are not allowed to view links. Register or Login to view.) you also ask for an impossible proof. The VMS is characterized by very specific patterns and statistics. Keep in mind that a method must explain the deep correlation between frequency, similarity, and spatial vicinity of tokens within the VMS text. It is therefore reasonable to assume that only one method results in this specific patterns and statistics.

(To argue that there might be some other method also implies that you accept the self-citation method as possible. This would contradict your previous statement that self-citation only explains a fraction of the text.)

(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The network of words with their edit distances only shows the existing words.
One has to imagine that inside this network there is a much denser network of other words, also with edit distance one (1) to the existing words, that do not occur in the text. 

Effectively, the network of existing words consists of very specific 'paths' through this denser network

This is an argument in favor of the self-citation method. Only with a systematic approach it would be possible to use every thinkable way to modify the tokens. Keep in mind that the self-citation method was used since copying is the most effective way to generate text. There is simply no reason to assume that the scribe put some effort in searching for missing token variants. 

(03-06-2019, 05:55 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I find it hard to believe that this network of paths is just the result of chance.

There is far more planning and system in the MS text than this.

I find it hard to believe that your theory could be true; therefore your theory must be false is not a valid argument (see You are not allowed to view links. Register or Login to view.). 

Anyway, nobody is arguing that the network is the result of chance. We are arguing that high-frequency tokens also tend to have high numbers of similar words and  that isolated" words (i.e., unconnected nodes in the graph) usually appear just once in the entire VMS. With other the network behaves as expected for the self-citation method.
Pages: 1 2 3 4 5 6 7