The Voynich Ninja

Full Version: Demystifying the Voynich manuscript using computational linguistic techniques
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
There is a new paper published about the VMS: You are not allowed to view links. Register or Login to view.

The author is Kevin Farrugia from the University of Malta.

The paper describes an experiment that takes the proposed classification by Dr Lisa Fagin Davis as ground truth and puts it to the test. The text is split into an equal number of pages per scribe; taking into consideration three of the five scribes due to scribes 4 and 5 having much less data.

Farrugia describes his results as:
Quote:In both experiments, with trigrams and bigrams, all classifiers have very high F1 scores for scribe 1. ... This information is reflected in the confusion matrices in which, with the help of the heatmap, it is easy to see that pages labelled as scribe 1 in the ground truth are rarely classified as the other two scribes. The tendency among the classifiers is to classify pages labelled as scribe 2 to be scribe 3, and vice-versa.


Not surprisingly the paper lists "the trigram 'edy' and the bigram 'ed'" as criteria to distinguish between scribe 1 and scribe 2/3 (or between Currier A and Currier B).
Does this mean they found differences between Currier A and B, but not between two hands in the same language?
I think it is very brave to write a bachelor dissertation on the VMS. Farrugia enters "unknown territory" with this.
So, Scribe 1 wrote Currier A? And Scribes 2 and 3 wrote some other parts in Currier B, but 'they' are not in total agreement as to who is Scribe 2 or Scribe 3. No kidding.

If the 'ed' bigram is a prominent marker of distinction, and let's presume that it is, then the 'edy' trigram is surely the most frequent example of "ed*" where "*" is any VMs glyph. It's really not as much of a support as some other, more diverse trigram candidates would provide.
Hi Thorsten,
I have not read Kevin Farrugia’s article yet, but I found the same criteria to distinguish between scribe 1 and scribe 2/3. To understand the whole concept you would have to read my article “The Voynich Cipher Disk” , here just the explanation for the the trigram 'edy' and the bigram 'ed'.
Words starting with _oX  and _4oX (X= K,T,P or F) are built out of a group of only 34 tokens on the outer Cipher disk. All tokens that end with   ‘e’  ( also ‘ch’  ‘ sh’) require a termination with ‘dy’ or ‘d’,  if the word is not continued with another tokens. This termination is inserted by the scribes choice and has no meaning (null). It is only there to hide the fact that ‘e ch sh’ are not part of the green finals (normal endings). I named them placeholder in my article and colored them grey. There are other  examples like final ‘a_’  or ‘m_’  that give the scribe a choice without changing the meaning of the encoded text.

You are not allowed to view links. Register or Login to view.

If you have any questions I gladly answer them.
Thanks
Finding differences between Currier A and Currier B is not notable. It's like finding a difference between a white rose and a red rose. We have known that A and B are in a different "language" and a different hand since the early days.

The real question is: what about the B-scribes? We must assume that Lisa's assessment that the hands are different is correct. So if we cannot find a statistical difference between two scribes in the same Currier language, what does this mean?

- Is it the same person, who changed his handwriting but not his method? This would mean that the method of converting or generating the text is very well-defined, because this one person hung onto it despite a handwriting change. 

- Is it a different person? This would imply that the method is well-defined and shared. 

All in all, if there is no difference between two hands in the same dialect, this would be encouraging because it would mean that they took the language seriously, that there is some system.
I am somewhat reserved in my assessment of other "language".
A few attempts have shown me that different writing styles already make a difference.
If someone simply uses a different vocabulary, it changes visually, even though the content is the same.
Maybe one person is simply more experienced in writing.

It is also interesting that the translator ends up with almost the same thing.

Word extra / German / English
[attachment=6472]

Translated with You are not allowed to view links. Register or Login to view. (free version)
It should be remembered that this dissertation and it's friends "Dissertations - InsLin - 2021" are all data science/ computational linguistics papers.
Their main focus is on demonstrating data science skills and methods
rather than
on discovering something revelatory, whether that be about Maltese emojis, colour terms or the VMS.

Also i think the sentence,
"The tendency among the classifiers is to classify pages labelled as scribe 2 to be scribe 3, and vice-versa."
is not synonymous with
"cannot find a statistical difference between two scribes in the same Currier language".
However i have not read the paper.
I have been unable to download the paper. Do I miss something, or is it only downloadable by asking access to the Malta University?

The fact that scribes 2 and 3 are not easily distinguishable can also be seen in You are not allowed to view links. Register or Login to view..

I would also be curious to see if the existence of Pharmacese is mentioned in the paper: i.e. the fact that Pharma aka Small-Plants pages (yellow dots in the plots) are different from Herbal A, though Currier classifies both as language "A" and Lisa classifies both as Scribe 1. The most prominent feature of Pharmacese is the high frequency of EVA:eol. This fact has been known for years, but its implications are still unclear to me.