There are alternatives to using the Levenshtein distance, such as the Damerau–Levenshtein distance. However I don't know which would be the most suitable.
The Levenshtein Edit Distance measures the number of edits to get from one text string to another. However this is really a measure of difference not similarity. I think the problem with this is particularly noticeable with long text strings.
I wonder whether a measure where you count similarity instead of difference would be better. So for example high points are scored for identical letters in the same position. Not as high for identical letters shifted by a number of positions(the smaller the shift the higher the points). One could then loop through a string looking for the similarity for each letter in the string to the letters in the other string.
You are not allowed to view links.
Register or
Login to view.
So there are other string metrics which we could use.
If we compare strings:
"abcde fgh"
with
"abcde ijk"
And then we compare strings:
"fgh"
and
"ijk"
then they appear equally as different/similar when in fact the strings starting "abcde" could be considered to be much more similar given their greater commonality. So here the Levenshtein distance doesn't seem to be giving me what I want.
I wonder if using the sum of the reciprocal is worth considering. Obviously taking a reciprocal turns a big thing into a small thing and vice versus.
I suppose one thing that could be done is manually crudely sorting the words in order of my sense of difference/similarity and then work out what function broadly reflects that way of considering difference.
It seems another alternative is to calculate something like Ratcliff-Obershelp similarity.
I was wondering about the Levenshtein distance per letter. By this I mean taking the Levenshtein distance and dividing it by the word length, of the longest of the two words that you are comparing.
You would then calculate the mean Levenshtein distance per letter and sort accordingly. This would reduce the bias towards longer words. However this doesn't seem like a completely adequate solution.
I was thinking what makes me consider some words more distinctive than others. I think the following effect my thinking:
How unusual is the first letter of the word relative to other words?
How unusual is the second letter of the word relative to other words?
..
How unusual is the last letter of the word relative to other words?
(Are the letters at the beginning of the word more important than the end of the word?)
So many words start 'ok', so these should be flagged up.
Are the common relative position or sequence of letters important when comparing words?
The Levenshtein distance allows deletion, insertion and substitution. What about shifting? What about common substrings? Maybe one common substring shift could count as a single shift?
(08-05-2023, 11:17 AM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view. (08-05-2023, 10:53 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If the Voynich MS were a regular cipher from the late middle ages, even a relatively complicated one, then these cribs would have been extremely helpful.
Obviously this depends what one means by 'regular'.
I’ll have to let ReneZ speak for himself, but my interpretation of his use of “regular” is “historically documented and precedented”. If the VMs text had many features that drew strong parallels to contemporary ciphers, that would be a strong argument in favor or it being a cipher. But instead, its similarities to contemporary ciphertexts are tenuous and tantalizing, not pervasive. So one cannot rule out the possibility that the VMs in fact isn’t a cipher, but just resembles one, perhaps by design. If it indeed is a late medieval cipher, it must be a highly innovative one in its form and function, the likes of which has yet eluded the historical record.
Mark Knowles, I understand you’ve been doing some in-depth historical research on late medieval ciphers, digging through European archives and other primary sources in person. If there is a “missing link” still in existence — a late medieval cipher with numerous features in common with both the VMs text and well documented specimens — you’re one of the most likely researchers here to have seen it. In any event, I’m curious to see what you’ve found and how it compares, and will definitely buy anything you publish on the topic.
I agree with David Jackson that the work of Ramon Llull, and his students and followers, is a potentially promising lead, when looking for novel ways a late medieval writer might have designed a newer and better cipher from scratch.
The frustrating part of all this, is that comparison to well-documented contemporary ciphers has good positive predictive value, but not good negative predictive value. In other words, finding strong parallels would argue in favor of the VMs text being a ciphertext. But failure to find strong parallels does not rule out that possibility. Which possibility involves fewer additional assumptions for Occam’s Razor to shave away: A) That someone in 1405 designed a very novel and highly effective form of encryption that never took off after the VMs, or B) That someone in 1405 arranged a whole book full of written marks in a way that superficially resembles a cipher, but isn’t one?
A problem in all this is, that different people have a different understanding of the meaning of the word 'cipher'.
What some people would call a cipher, others would call a 'key'.
Each page in Tranchedino, for example, is a key.
That is just one aspect of the possible different understandings.
(11-05-2023, 12:38 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view. (08-05-2023, 11:17 AM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view. (08-05-2023, 10:53 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.If the Voynich MS were a regular cipher from the late middle ages, even a relatively complicated one, then these cribs would have been extremely helpful.
Obviously this depends what one means by 'regular'.
So one cannot rule out the possibility that the VMs in fact isn’t a cipher, but just resembles one, perhaps by design.
I would agree with that, although one then has the problem of finding a historical precedent for whatever alternative one hypotheses.
(11-05-2023, 12:38 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.I agree with David Jackson that the work of Ramon Llull, and his students and followers, is a potentially promising lead, when looking for novel ways a late medieval writer might have designed a newer and better cipher from scratch.
I have also said before as have others that the work of Ramon Llull is a very interesting avenue to explore, so I completely concur on this point.
What I can report from my research is that there were ciphers from the time that the Voynich manuscript is carbon dating to that are significantly more advanced than known of before. I can say that these ciphers exhibit sophistated features not seen in later 15th century ciphers. I can also say that on the basis of this evidence that the time from which the Voynich manuscript is carbon dated was a time of much greater cryptographic innovation than the previous or subsequent 50 years(possibly with the exception of the work of Leon Battista Alberti in 1466). The ciphers that I refer o here of are Milanese diplomatic ciphers. Whether or the extent to which the ciphers have anything in common with Voynichese is not clear, though it is not obvious that there is something in common.
Something that I, personally, found very persuasive of a link is that my much earlier cartographic research on Rosettes folio 86v lead me to believe that the author of the Voynich manuscript likely originated from the fairly isolated rural Abbey of Saints Nazzaro and Celso by the Sesia River. Assuming that this idea is correct the Abbot of this abbey therefore logically would seem by far the most likely candidate. It turns out that the Abbot was called Antonio Barbavara. Surprisingly subsequent research, unknown to me before my identification, has shown that the individual most likely to be responsible for ciphers in the Milanese chancellery was his brother Francesco Barbavara and other family members wrote ciphers for the Milanese government. Now it could be a "coincidence", to use the often dismissed term, that the pre-identified author and Abbot of a minor rural Abbey turned out to be the brother of maybe the most advanced cryptographer of the time who was inventing innovative ciphers. I am now keenly trying to locate Milanese ciphers from between the years 1425 and 1438 to help to fill a gap in my knowledge of these ciphers.
I agree there is some lack of clarity in what people mean by the use of the word cipher. I am open to any definition of this term as long as we are all using the same definition as I don't want to get too bogged down in semantics.
On the subject of why innovative features of Milanese ciphers from the Voynich manuscript don't survive into Milanese ciphers later in century my thinking if that practicality is key. The purpose of a cipher is not only to make it difficult for someone without the key to break, but also that it needs to be as easy as possible for someone with the key to encrypt and decrypt messages and also easy for the cipher clerk to generate a new cipher key when needed. My hypothesis is that at some stage Milanese ciphers were simplified and standardised as they had become too difficult for ordinary diplomats to use and some of the more sophisticated features were dropped.
There is only this one image here.
You are not allowed to view links.
Register or
Login to view.
(11-05-2023, 06:04 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.There is only this one image here.
You are not allowed to view links. Register or Login to view.
Yes, I have seen that. And there is a fresco in the Abbey of Saints Nazzaro and Celso which includes the Abbot.
Off hand, i don't recall seeing an abbot on a flying horse in any other book of hours. Perhaps, you have also had the opportunity to see if the text has anything unusual?
(11-05-2023, 06:19 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.Off hand, i don't recall seeing an abbot on a flying horse in any other book of hours. Perhaps, you have also had the opportunity to see if the text has anything unusual?
I would guess that that is not the Abbot, although I could be wrong.