Discussion of "A possible generating algorithm of the Voynich manuscript"

Discussion of "A possible generating algorithm of the Voynich manuscript" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Discussion of "A possible generating algorithm of the Voynich manuscript" (/thread-2790.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - nablator - 10-06-2019

(10-06-2019, 02:59 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This results in a situation where similar words appear near each other.

This works well for the first time they appear, but the next time they will be less likely to be near each other, so I suspect that it will be hard to maintain this good result on a much longer text. Spikes on first correlated appearances will not offset the overall uncorrelated appearances. It would be interesting to check whether uniquely appearing words (hapax legomena) are more or less line-distance-edit-distance-correlated than other words, also in the VMS.

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - ReneZ - 11-06-2019

@nablator,

possibly, let's see.

This is indeed the reason why it is more interesting to have a text with changing subject matter.
The longest text that seems to have been analysed in this manner in the paper is the stars/recipes section in quire 20.
This has about 9900 words (word tokens) and less than 3300 unique words (word types).

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - ReneZ - 11-06-2019

So here is one more attempt.

I used the first approx 10,000 words of Pliny's natural history.
It has more than 3999 word types, so I added the Roman numeral Q to represent 5000.

I also introduced an alternative representation of the numbers (not Voynich-like).

Here are the files:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Second edit: problem should have been solved.

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - nablator - 11-06-2019

I tried reordering all bifolios in Q20 and got better results (on longer line distance and lower edit distance too) on the first try with You are not allowed to view links. Register or Login to view. first then each entire bifolio read before the next. To find the best order I compared individual pages with average edit distance between couples of words of couples of pages: the only clear result is that f. 116r is closest to any page of Q20 than any other page of Q20!

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - nablator - 11-06-2019

(11-06-2019, 10:30 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.So here is one more attempt.

I used the first approx 10,000 words of Pliny's natural history.
It has more than 3999 word types, so I added the Roman numeral Q to represent 5000.

I also introduced an alternative representation of the numbers (not Voynich-like).

Here are the files:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Second edit: problem should have been solved.

pliny_norm:

pliny_mod1_80_cols:

pliny_mod2_80_cols:

Very good!

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - Koen G - 11-06-2019

Noob question: if your words are all very short, won't this increase the likelihood of lower values?

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - nablator - 11-06-2019

(11-06-2019, 12:35 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Noob question: if your words are all very short, won't this increase the likelihood of lower values?

It will, but values don't matter, only the relative variation between near values and far values matters. A list of pseudo-Voynichese words generated from earlier words by modifying and adding glyphs within constraints should work as well or better than Roman numerals.

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - ReneZ - 11-06-2019

Koen: in principle, yes.

However, with longer words the values can also be small if these words follow a clear structure.
This is the case with the Voynich MS and also with the Roman numerals.
Note that the average word length for the 'Roman numerals' version of the text is considerably longer than the original, but the asymptotic value of the distance is lower.

The purpose of this exercise was not to create a model for the Voynich MS text. It was jut to show two things:

- the same text can have very different behaviour depending on how words are defined
- it is possible for a meaningful text to show a correlation between edit distance and vertical distance in the text

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - Koen G - 11-06-2019

Right, I just watched the graph in the paper and understand what we're aiming for, your Pliny does a good job.

Methodologically, comparing this aspect of Voynichese to one modern text in order to judge the normality of its behavior isn't exactly giving it a fair shot...

RE: Discussion of "A possible generating algorithm of the Voynich manuscript" - Koen G - 11-06-2019

Nablator when I try to compile your code it says:

/tmp/java_uwIjbh/MDistBetweenLines.java:9: warning: [unchecked] unchecked conversion
static ArrayList<ArrayList<String>> lineList = new ArrayList();
^
required: ArrayList<ArrayList<String>>
found: ArrayList