The Voynich Ninja

Full Version: The {8am} strategy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Thoughts on the {8am} strategy: meaning a focus on the v101 "word" {8am} as a test case for mapping from the Voynich manuscript to natural languages.

You are not allowed to view links. Register or Login to view.

[attachment=8458]
A series of mappings of [8am} from various transliterations of the Voynich manuscript to medieval Italian, as represented by the OVI corpus. Author's analysis. H1 = “herbal” section, parts A and B. D1 = Scribe 1 (per Dr Lisa Fagin Davis). RSQ = correlation coefficient between glyph and letter frequencies. AFD = average absolute difference between glyph and letter frequencies.
"The next step would be to map some other ubiquitous “words” in the Voynich manuscript, such as {oe} and {am}; and see whether we can make some more real words in the target language. If several common Voynich “words” can be mapped to real words in some language, we might venture onwards, to mappings of whole lines. Those mappings would have to make sense."


I think this is a very good plan but I think the topic should be considered. I think a distinction should be made between high-count words on plant pages and high-count words in Quire 13 which, from the illustrations, seems to have little, if anything, to do with plants.
if 8am maps to NOC , then you have 3 mapped voynichese glyphs.

Then perhaps a next step would be to find other voynichese words that contain those 1-3 mapped glyphs + 1 unmapped glyph
and then use the 1-3 known lettters as clues (cues ?) to help find a plausible candidate letter for the unmapped glyph.
Further thoughts on what I call the {8am} strategy, for identifying the precursor languages of the Voynich manuscript. 

You are not allowed to view links. Register or Login to view.

[attachment=8861]
Selected mappings of the "word" [8am} to words in some medieval European languages. Author's analysis
(16-07-2024, 02:17 PM)dfs346 Wrote: You are not allowed to view links. Register or Login to view.Further thoughts on what I call the {8am} strategy, for identifying the precursor languages of the Voynich manuscript. 

You are not allowed to view links. Register or Login to view.

Hi, there,
In the enclosed  document, I found 5 (8dam) words, which I read as 'dam'. The document was written between 1428 and 1440.[attachment=8859]
Further thoughts on the {8am} strategy: mappings of {1oe} and {2c9}

You are not allowed to view links. Register or Login to view.

[attachment=8960]
Selected mappings of the "words" [8am}, {1oe} and {2c9} to words in selected medieval languages. Author's analysis.
I wasn't to ask a couple of questions about your results if that is okay. Please don't take these questions as criticisms; I definitely do not have the authority to criticize anything, but rather I am coming from a place of asking myself similar questions.

I read the 3 related blog posts but I'm still not clear on RFD calculations. Is it:
Code:
abs(Freq in VMS- Freq - OTHER) for each term / number of terms * 100?
Related to this, what is the cut-off for maximum absolute difference allowed between VMS character frequency and other document letter frequency and how did you decide this? Maybe another way of asking this question is, judging from your reported RFDs and presuming you are multiplying by 100 to get percent, you are starting at raw absolute differences of .0005 or less, so how do you decide, hypothetically, that VMS c maps to N at a .0004 difference vs an S which was at say .0006?  Are the frequency differences much higher than I am thinking so they generally lead to only one choice, or are you always choosing the smallest?

Stats are a weakness of mine so I could be wrong here but because you are picking letters based on related frequency, shouldn't each anagram have a near perfect positive relationship at the get go? Is there another factor I am missing or is there a specific value your correlation coefficient covers which isn't already covered by the RFD?

Why are you (usually) choosing to count just the frequency in the herbal section vs the entire document and how do you choose which documents to "map" to?


Thank you!
Regarding the text image in post #5

Looks like you missed two. First is two lines up from the third one. The second is halfway between the third and fourth. The same line-initial four-word sequence appears to be used in most cases.
(01-08-2024, 07:25 PM)A.Wilmarth Wrote: You are not allowed to view links. Register or Login to view.<>RFD calculations. Is it:
Code:
abs(Freq in VMS- Freq - OTHER) for each term / number of terms * 100?
<>Why are you (usually) choosing to count just the frequency in the herbal section vs the entire document and how do you choose which documents to "map" to?

Thank you for your interest in what I have called the {8am} strategy.

The average absolute frequency difference (AFD), as I call it, is a metric for ranking or prioritising various alternative transliterations of the Voynich manuscript (currently numbering about thirty-seven), all of my own devising and all based on Glen Claston’s v101. All of my transliterations differ from v101 in one significant respect: namely my assumption that Claston’s {4o} is a single glyph. I have used the Unicode symbol ④ to designate this glyph. 

Most of my transliterations differ from v101 in one additional significant respect. For example my v121 transliteration makes the assumption that {2} and all its variants {3, 5, !, %, +, #} are equivalent to the glyph {1} with a accent or diacritic of unknown function. Some of my transliterations differ from v101 in two or more significant respects. For example my v121.F transliteration not only replaces {2, 3, 5, !, %, +, #}  by {1’}, but also replaces each of the major “bench glyphs” by a "gallows glyph" plus a {1}; e.g. {F} => {f1}, {G} => {g1} and so on.

An example may serve to illustrate the AFD.

Suppose that I am planning to map some Voynich “words” to text strings in, say, medieval Italian as represented by the OVI corpus. I am hoping that some of these text strings will be real words. I need to try all of my transliterations. A priori, I have no way of knowing whether one transliteration is better than another.

For any given transliteration, the first step is to juxtapose the frequencies of the glyphs in that transliteration with the frequencies of the letters in the target language. This is what, in my forthcoming book Voynich Reconsidered (Schiffer books, August 2024), I have called the “Gold Bug” approach. It was used by Edgar Allen Poe in his short story “The Gold Bug”, and later by Sir Arthur Conan Doyle in the Sherlock Holmes story “The Adventure of the Dancing Men”. For the v101 transliteration (with my modification which I call v101④), the juxtaposition of the first ten glyphs and letters looks like this:

[attachment=8968]

The most common letter in OVI is E with a frequency of 12.7%; the most common glyph in v101④ is {o} with a frequency of 15.3%; the absolute frequency difference is 2.6% (I ignore plus or minus).

The second most common letter in OVI is A with a frequency of 10.0%; the second most common glyph in v101④ is {9} with a frequency of 11.6%; the absolute frequency difference is 1.6%.

… and so on …

I calculate the absolute frequency differences for all the available letters, or all the available glyphs (whichever is less). The average absolute frequency difference is simply the average of all the individual frequency differences.

The lower the AFD, the more closely the “shape” of the transliteration approximates to the “shape” of the medieval Italian language; and therefore (to my mind) the more encouragement for the hypothesis that the transliteration represents the medieval Italian language. I don't apply any cut-off for the AFD.

The OVI-v101④ juxtaposition has an AFD of 0.42% which is not bad. For OVI, the transliteration with the lowest AFD is v170 which has an AFD of 0.35%. The v170 transliteration makes the assumption that {m}, {M} and {n} are not single glyphs but strings:  {m} => {iiń}, {M} => {iiiń}, {n} => {iń}.

After calculating the AFDs, I proceed with trial mappings from each one of the transliterations. In each case the trial mapping process is a simple map from the n-ranked glyph to the n-ranked letter, as in “The Gold Bug”. So, in v101④, {o} maps to {E}, {9} maps to {A} and so on. In the v170 transliteration, the top four mappings are the same as in v101④. But as we go down the rankings and encounter less common glyphs, the mappings start to differ from one transliteration to another.

For example, in v101④, the “word” {8am} maps to NOC, while in v170, {8am} becomes {8aiiń} which maps to RONNC.

For each trial mapping, I look for the text string in the respective corpus (in this case, OVI). For example, the words NOC and RONNC do not occur in OVI. In such cases, I explore the possibilities of re-ordering the text string, or simply reversing the order of the letters (which would be a simple encipherment). Here we find that NOC can be re-ordered as CON, which was an extremely common word in medieval Italian, as it is also in modern Italian (equivalent to the English “with”). However, RONNC cannot be re-ordered to any word in OVI.

In this instance, twenty different transliterations yielded a possible mapping of {8am} to the Italian word CON. Those with the lowest AFD were v226, v101④, v120 and v151. This result encouraged me to prioritise those transliterations in trial mappings of other Voynich “words”, such as {1oe} and {2c9}.

With other languages than medieval Italian, the AFDs would be different. At present my team is testing about fifteen medieval languages including period-specific variants (for example, medieval Italian can also be represented by Dante’s La Divina Commedia, which has slightly different letter frequencies from those of OVI).

In summary, the AFD is not in itself a mapping process. It is a metric for prioritising alternative transliterations.

Regarding my use of the “herbal” section, pages written by Scribe 1: this is simply an attempt to isolate a homogeneous chunk of the Voynich manuscript, on the hypothesis that in such a chunk, there is probably one precursor language.
(03-08-2024, 08:35 PM)dfs346 Wrote: You are not allowed to view links. Register or Login to view.
(01-08-2024, 07:25 PM)A.Wilmarth Wrote: You are not allowed to view links. Register or Login to view.<>RFD calculations. Is it:
Code:
abs(Freq in VMS- Freq - OTHER) for each term / number of terms * 100?

<>Why are you (usually) choosing to count just the frequency in the herbal section vs the entire document and how do you choose which documents to "map" to?



Thank you for your interest in what I have called the {8am] strategy.



What I have called the average absolute frequency difference (AFD) is a metric for ranking or prioritising various alternative transliterations of the Voynich manuscript (currently numbering about thirty-seven), all of my own devising and all based on Glen Claston’s v101. All of my transliterations differ from v101 in one significant respect: namely my assumption that Claston’s {4o} is a single glyph. I have used the Unicode symbol ④ to designate this glyph. 


[...]



I calculate the absolute frequency differences for all the available letters, or all the available glyphs (whichever is less). The average absolute frequency difference is simply the average of all the individual frequency differences.



The lower the AFD, the more closely the “shape” of the transliteration approximates to the “shape” of the medieval Italian language; and therefore (to my mind) the more encouragement for the hypothesis that the transliteration represents the medieval Italian language. I don't apply any cut-off for the AFD.

[...]

So you have a discrete probability distribution and you want to measure its similarity to N other discrete probability distributions, and the similarity measure you've chosen is AFD. What other metrics to do that did you look at, and why did you pick the AFD over those other alternatives?
Pages: 1 2