The Voynich Ninja

Full Version: Does network complexity help organize Babel’s library?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
There is a paper from 2016 about the VMS: You are not allowed to view links. Register or Login to view.
The preprint version of the paper by Juan Pablo Cárdenas, Iván González, Gerardo Vidal and Miguel Angel Fuentes is available at arxiv.org: You are not allowed to view links. Register or Login to view.

The authors come to the conclusion that the Voynich manuscript contains "a written text which has been ciphered, possibly by a permutation process of its words".

The authors build networks for adjacent words for text samples of different kings. One outcome for the network build for the Voynich manuscript is that it contained the "only analyzed text that does not present a significant difference between assortativities before and after randomization." 

They conclude that the "anomalous behaviors with respect to clustering and assortativity between its original and ciphered versions, suggest that the manuscript is a ciphered text, even with a lower disassortativity. We use the term ciphered because the manuscript has a word frequency distribution that follows Zipf’s law, and because the correlation between <C> and <l> also fits the power function of other texts."
I haven't finished reading the paper yet, but it does look interesting. They compute a number of measures, with the goal of recognizing linguistic texts from non-linguistic texts. The measures are based on undirected graphs were words that appear next to each other are connected by an edge.

The plot on the left (Fig.5) is Mean Clustering, for the original text on the X axis and a word-scrambled version on the Y axis.

Meaning of the dots:
The Universal Declaration of Human Rights (black triangles), classic books (blue squares), computer codes (yellow squares) and “Voynich manuscript” (red square).

In all linguistic samples (black and blue), the original text has considerably higher C than the scrambled text. For the Voynich ms, this is only marginally true. Computer Codes are somehow intermediate between the VMS and linguistic texts.

The graph on the right (Fig.8) plots Z-score(Assortativity) vs number of network edges E correlation. Also in this case, the VMS appears to behave differently from the other samples.

I am thinking of trying to replicate some of these experiments: in particular, the graph on the left shows such a coherent trend for all texts... but I am not sure I will find the time and energy to do so.
Thanks, Marco. I had not yet looked at the paper but the results do look interesting. I don't quite understand what makes the VM stand out like this though. What does "higher C" imply?

Maybe the texts I gathered for TTR can help? I have no idea how hard it is to code these things. You'd need one code to scramble text files and output them to a different folder and another to calculate C... (nablator?  Wink)
(30-01-2020, 01:37 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I have no idea how hard it is to code these things.
Hello Koen,

It's easier when someone else does it. Wink

I did the (much simpler) WPPA calculations, BTW.
(30-01-2020, 01:37 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Thanks, Marco. I had not yet looked at the paper but the results do look interesting. I don't quite understand what makes the VM stand out like this though. What does "higher C" imply?

They say that the higher "mean clustering" C values of the non-scrambled texts denote "high transitivity of word connections".
If I understand correctly, two words are connected if they appear next to each other (W1.W2, or W2.W1). C measures how often, when W1 is connected with both W2 and W3, also W2 and W3 are connected (this would result in a closed triangle in the network).
From Table 1, I also get the impression that the value of C depends on the length of the text, with longer texts averagely resulting in higher C - it would probably be better to use texts of similar length. I wonder why their Voynich sample only has 1997 different word types....

From the network in Fig.2, I understand that they build networks on the basis of punctuation. That network is based on this text:

"en un lugar de la mancha, de cuyo nombre no quiero acordarme, no ha mucho tiempo que vivı́a un hidalgo de los de lanza en astillero"

'acordarme' and 'mancha' are only connected with the preceding word: I guess this is because they are followed by a comma. If their method depends on punctuation, it is no wonder that the VMS behaves differently...
@Marco

I see you like to make statistics.
I think it is not enough to simply compare a language with the VM.
I see the reason for this in the encryption technology.

I would be interested to see what the change looks like.
Between an author who writes Latin, north of the Alps ( Alemanic language area )
Southern Germany.
And an author south of the Alps ( North Italy ) who also writes Latin.
And to compare this with the VM.

Here a visible difference might be interesting. Both variants are good candidates for Voynich.
Marco: so it looks like the results might be different for pre-processed texts and properly normalized. That's a bit disappointing, I thought the differences between Voynichese and real texts were quite striking.

Nablator: yeah sorry, I have no idea whether these things take 10 minutes to code or a whole day, and hence whether it would be worth the effort or not. So maybe it isn't Smile

Aga: with most statistics, looking at various dialects of the same language will make little difference. Unless you really focus on the differences between the dialects and somehow find a statistical property that separates the two..
I'm not thinking dialects here.
The Italian language is closer to Latin than the
Germanic. This should also be reflected in the use of words. Even if in the end it refers to the same language. Here the person is in the foreground.
(30-01-2020, 07:57 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.@Marco

I see you like to make statistics.
I think it is not enough to simply compare a language with the VM.
I see the reason for this in the encryption technology.

I would be interested to see what the change looks like.
Between an author who writes Latin, north of the Alps ( Alemanic language area )
Southern Germany.
And an author south of the Alps ( North Italy ) who also writes Latin.
And to compare this with the VM.

Here a visible difference might be interesting. Both variants are good candidates for Voynich.

Is your opinion that Latin is a good candidate based on features of Voynichese? If yes, what are these Latin-like features?
I do not believe in Latin directly, and certainly not in Latin as we know it today.
But I think about Latin in a similar way. This has several reasons.
The scientific aspect. what we see in VM is certainly not something you just learn. Here education is certainly in the background. Education was certainly also learned in Latin.
The details given in the VM about the origin of the manuscript, southern Alps, northern Italy, linguistic connection from the dialect Italian to Latin.
The use of suffixes as shown in the VM.
Application at the beginning of words like ( o, 8, 9, 4 ).
Of course also the presence of German. What else should he write in ? Hebrew ? Assuming the conversion of text written from right to left, in left to right and an additional encryption ?

Basic question:
Do I make such a fuss for myself if I don't intend to show the book to someone
What do I do if I want to show it to somebody ? Does he first have to learn a language and master a complicated encryption system? For something I might find in a library. ( Based on the drawings ).
How much effort justifies the benefit ?
Pages: 1 2 3