(01-08-2024, 07:25 PM)A.Wilmarth Wrote: You are not allowed to view links. Register or Login to view.<>RFD calculations. Is it: Code:
abs(Freq in VMS- Freq - OTHER) for each term / number of terms * 100?
<>Why are you (usually) choosing to count just the frequency in the herbal section vs the entire document and how do you choose which documents to "map" to?
Thank you for your interest in what I have called the {8am} strategy.
The average absolute frequency difference (AFD), as I call it, is a metric for ranking or prioritising various alternative transliterations of the Voynich manuscript (currently numbering about thirty-seven), all of my own devising and all based on Glen Claston’s v101. All of my transliterations differ from v101 in one significant respect: namely my assumption that Claston’s {4o} is a single glyph. I have used the Unicode symbol ④ to designate this glyph.
Most of my transliterations differ from v101 in one additional significant respect. For example my v121 transliteration makes the assumption that {2} and all its variants {3, 5, !, %, +, #} are equivalent to the glyph {1} with a accent or diacritic of unknown function. Some of my transliterations differ from v101 in two or more significant respects. For example my v121.F transliteration not only replaces {2, 3, 5, !, %, +, #} by {1’}, but also replaces each of the major “bench glyphs” by a "gallows glyph" plus a {1}; e.g. {F} => {f1}, {G} => {g1} and so on.
An example may serve to illustrate the AFD.
Suppose that I am planning to map some Voynich “words” to text strings in, say, medieval Italian as represented by the OVI corpus. I am hoping that some of these text strings will be real words. I need to try all of my transliterations. A priori, I have no way of knowing whether one transliteration is better than another.
For any given transliteration, the first step is to juxtapose the frequencies of the glyphs in that transliteration with the frequencies of the letters in the target language. This is what, in my forthcoming book
Voynich Reconsidered (Schiffer books, August 2024), I have called the “Gold Bug” approach. It was used by Edgar Allen Poe in his short story “The Gold Bug”, and later by Sir Arthur Conan Doyle in the Sherlock Holmes story “The Adventure of the Dancing Men”. For the v101 transliteration (with my modification which I call v101④), the juxtaposition of the first ten glyphs and letters looks like this:
The most common letter in OVI is E with a frequency of 12.7%; the most common glyph in v101④ is {o} with a frequency of 15.3%; the absolute frequency difference is 2.6% (I ignore plus or minus).
The second most common letter in OVI is A with a frequency of 10.0%; the second most common glyph in v101④ is {9} with a frequency of 11.6%; the absolute frequency difference is 1.6%.
… and so on …
I calculate the absolute frequency differences for all the available letters, or all the available glyphs (whichever is less). The average absolute frequency difference is simply the average of all the individual frequency differences.
The lower the AFD, the more closely the “shape” of the transliteration approximates to the “shape” of the medieval Italian language; and therefore (to my mind) the more encouragement for the hypothesis that the transliteration represents the medieval Italian language. I don't apply any cut-off for the AFD.
The OVI-v101④ juxtaposition has an AFD of 0.42% which is not bad. For OVI, the transliteration with the lowest AFD is v170 which has an AFD of 0.35%. The v170 transliteration makes the assumption that {m}, {M} and {n} are not single glyphs but strings: {m} => {iiń}, {M} => {iiiń}, {n} => {iń}.
After calculating the AFDs, I proceed with trial mappings from each one of the transliterations. In each case the trial mapping process is a simple map from the n-ranked glyph to the n-ranked letter, as in “The Gold Bug”. So, in v101④, {o} maps to {E}, {9} maps to {A} and so on. In the v170 transliteration, the top four mappings are the same as in v101④. But as we go down the rankings and encounter less common glyphs, the mappings start to differ from one transliteration to another.
For example, in v101④, the “word” {8am} maps to NOC, while in v170, {8am} becomes {8aiiń} which maps to RONNC.
For each trial mapping, I look for the text string in the respective corpus (in this case, OVI). For example, the words NOC and RONNC do not occur in OVI. In such cases, I explore the possibilities of re-ordering the text string, or simply reversing the order of the letters (which would be a simple encipherment). Here we find that NOC can be re-ordered as CON, which was an extremely common word in medieval Italian, as it is also in modern Italian (equivalent to the English “with”). However, RONNC cannot be re-ordered to any word in OVI.
In this instance, twenty different transliterations yielded a possible mapping of {8am} to the Italian word CON. Those with the lowest AFD were v226, v101④, v120 and v151. This result encouraged me to prioritise those transliterations in trial mappings of other Voynich “words”, such as {1oe} and {2c9}.
With other languages than medieval Italian, the AFDs would be different. At present my team is testing about fifteen medieval languages including period-specific variants (for example, medieval Italian can also be represented by Dante’s
La Divina Commedia, which has slightly different letter frequencies from those of OVI).
In summary, the AFD is not in itself a mapping process. It is a metric for prioritising alternative transliterations.
Regarding my use of the “herbal” section, pages written by Scribe 1: this is simply an attempt to isolate a homogeneous chunk of the Voynich manuscript, on the hypothesis that in such a chunk, there is probably one precursor language.