The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

(11-06-2019, 01:55 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Nablator when I try to compile your code it says:

/tmp/java_uwIjbh/MDistBetweenLines.java:9: warning: [unchecked] unchecked conversion
static ArrayList<ArrayList<String>> lineList = new ArrayList();
^
required: ArrayList<ArrayList<String>>
found: ArrayList

I never read warnings. Bad habit. Smile

It should be:
static ArrayList<ArrayList<String>> lineList = new ArrayList<ArrayList<String>>();

(11-06-2019, 02:42 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I never read warnings. Bad habit.
It should be:
static ArrayList<ArrayList<String>> lineList = new ArrayList<ArrayList<String>>();

Okay that works. What do I need to type in the console?

(11-06-2019, 07:39 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Okay that works. What do I need to type in the console?

java MDistBetweenLines yourtextfile.txt maximum_line_distance > yourtextfile_dist.txt

The edit distances of all words of all lines between 1 and maximum_line_distance are averaged.

(11-06-2019, 12:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The purpose of this exercise was not to create a model for the Voynich MS text. It was jut to show two things:

- the same text can have very different behaviour depending on how words are defined
- it is possible for a meaningful text to show a correlation between edit distance and vertical distance in the text

You say yourself that the purpose of your exercise was not to create a model for the Voynich MS text. You are right, the correlation between edit distance and vertical distance is an interesting pattern. But this pattern only illustrates one feature of the Voynich MS text. Moreover, "in close vicinity" includes horizontal and vertical distance (see You are not allowed to view links. Register or Login to view., p. 14 and You are not allowed to view links. Register or Login to view., p. 3f). Figure 4 only considers the vertical distance and therefore still underrates the context dependency in the Voynich MS text.

Anyway, did your experiment mean, that you accept the third result of our text analysis? "The closer two words are (with respect to their edit distance), the more likely these words also can be found written in close vicinity" (You are not allowed to view links. Register or Login to view., p. 7f).

What about the first result? "The respective frequency counts confirm the general principle: high-frequency tokens also tend to have high numbers of similar words" (You are not allowed to view links. Register or Login to view., p. 6). Or in other words "when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail" (You are not allowed to view links. Register or Login to view., p. 3).

What about the second result? "A useful method to analyze the similarity relations between words of a VMS (sub-)section is their representation as nodes in a graph. ... The resulting network, connecting 6,796 out of 8,026 words (=84.67%). ... The longest path within this network has a length of 21 steps, substantiating its surprisingly high connectivity" (You are not allowed to view links. Register or Login to view., p. 4f).

Your exercise also demonstrates that it was necessary to generate a number of code words during writing. Only this way it was possible to simulate the high level of context-dependency for the VMS. Did this mean that you accept my conclusion that "the scribe was writing similarly spelled tokens near to each other because they depend in some way on each other" (You are not allowed to view links. Register or Login to view., p. 14)?

(11-06-2019, 09:44 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The longest text that seems to have been analysed in this manner in the paper is the stars/recipes section in quire 20.

In chapter 2 "Context-dependent self-similarity" we analyze the whole text of the VMS (see You are not allowed to view links. Register or Login to view., p. 2ff). It seems that you refer to the analysis of our "facsimile" text. This text was indeed generated to create a "facsimile" of the VMS “Recipes” section (see You are not allowed to view links. Register or Login to view., p. 2).

(11-06-2019, 12:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The purpose of this exercise was not to create a model for the Voynich MS text. It was jut to show two things:

- the same text can have very different behaviour depending on how words are defined
- it is possible for a meaningful text to show a correlation between edit distance and vertical distance in the text

Actually this correlation that we have been busy replicating with meaningful texts needs more analysis. If I am not mistaken, when the vertical distances are restricted to the same page the diagrams are much different:

Herbal A:
[attachment=3025]

(Only You are not allowed to view links. Register or Login to view. has enough lines for vertical distance as large as 24-25.)

Quire 20:
[attachment=3024]

(Only You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. have enough lines for vertical distance as large as 51-53.)

Global:
[attachment=3026]

(Labels removed in all calculations.)

So the strong correlation seen on figure 4 (page 8) (that includes long distances) does not actually extend to intermediate distances (15-50 lines), except across pages. Since large vertical distances are more likely to fall across two different pages than a short vertical distances, any difference between pages (a different setting/preference/whatever on each page) translates to a misleading gradual increase on figure 4. On my diagrams the local increase is more limited over a few lines (7 to 15) and then there is a large flat (or decreasing!) area until the end of the page where a spike is observed only on the largest pages.

It's weird. I really hope I'm wrong. Smile

Edit: not so weird because of the very low number of samples (1 or 2) and shorter end-of-paragraph lines.

Same computation with the TT transcription (the one on the previous post was my own, with capitalized benched gallows and Sh):

Herbal A:

[attachment=3029]

Quire 20:

[attachment=3028]

Again, the values on the far right of the graphs are not significant because of the reduced size of the sample.

(25-05-2019, 10:20 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Cryptologia will have given you a certain number of free to access papers you can share with your peers, and you also have the right to share preprint versions of the paper. Can you at least state that you will do this at some point in the future?

I have checked the author agreement. It is indeed possible to share a preprint or a postprint version on my own personal website. I have uploaded a version of the paper to my website: You are not allowed to view links. Register or Login to view..

hi mr. Torsten,
This is good news. I was keen on reading it.

I’ve been working on the text for a while, and my personal research touches your work quite considerably, but from a total different approach. I will soon post something.

edited to add: Not sure about the Hoax conclusion (which I don’t really mind), though.

regards, Alex

ps: Sorry guys to burst in. This is my first post after my introduction.

(11-06-2019, 12:35 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Noob question: if your words are all very short, won't this increase the likelihood of lower values?

Looking at possible confounding variables is an essential part of the analysis of correlations.

The variation in average word lengths alone might explain some if not all of the variation of the average edit distance between words of different lines of the same page.

For example TT Herbal A:

[attachment=3032]

Horizontal axis: vertical distance in lines (inside pages, not across different pages)
Vertical axis: average absolute value of the difference between lengths of all couples of words from these lines

Again, the values on the far right of the graphs are not significant because of the reduced size of the sample.

Access to the paper makes following this conversation much easier: many thanks to Torsten for sharing it!

It seems that much of the exchanges have focussed on Figure 4. Timm and Schinner present two Voynich samples and an English sample:

Quote:Black line (a): VMS “Recipes” section (f103r–f116v, Currier B);
red line (b): VMS “Herbal” section (f1r–f66v, Currier A);
blue line ( c ): the first 10000 tokens of “Alice in Wonderland”;

dashed lines: the respective asymptotic values of M̄ .

You are not allowed to view links. Register or Login to view. Nablator has computed a similar graph for a text by Aristotle translated into Latin by Boethius.

If I understand correctly, when line distance increases from 0 to 60:

In the VMS Recipes, word-distance increases by 0.20 ca (from 4.75 to 4.95)
In the VMS Herbal A, word-distance increases by 0.28 ca (from 4.4 to 4.68)
In Alice, word-distance is nearly unaffected (it could increase by 0.01 ca)
In the Latin Aristotle, word-distance increases by 0.17 ca (from 5.4 to 5.57)

Isn't the curve for the Latin Aristotle much more similar to that for the VMS Recipes than to that for Alice in Wonderland?

As Nablator points out, the text by Boetius is peculiar; yet I don't think it can feature the consecutive repetition of identical words (which is quite rare in Latin prose). Clearly, exact reduplication must contribute to the effect discussed by Timm and Schinner (even if of course other phenomena contribute as well). It would be interesting to see how texts in a reduplicating language behave, but, in most examples I have seen, reduplicated words are joined into a single word: this of course would make them invisible to this method.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

nablator

Koen G

nablator

Torsten

nablator

nablator

Torsten

Lordadef

nablator

MarcoP