The Voynich Ninja

Full Version: Arabic as precursor language
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Further to my recent post in another thread, on Arabic as a possible precursor language of the Voynich manuscript, I tested a range of alternative transliterations of the Voynich text, all based on Glen Claston's v101 but differing from v101 in one or more respects. I numbered these transliterations v101④ through v202. (The ④ signifies that in all the transliterations, I treated the v101 glyph pair {4o} as a single glyph, to which I assigned the Unicode symbol ④.)

For comparison of the Voynich text with the Arabic language, I used Arabic letter frequencies derived from the works of Ibn Kathir (1300-1373)

In order to test my Voynich transliterations, I started by calculating the statistical correlations between the glyph frequencies and the Arabic letter frequencies. However, with two short descending sequences such as ibn Kathir's Arabic alphabet (which has 43 letters), and the 43 most frequent glyphs in the v101 transliteration (which account for 98.6 percent of the text), it is relatively easy to obtain correlations well in excess of 90 percent. Substantial differences between transliterations (for example combining the {2} group of glyphs) result in quite small changes in the frequency correlations.

I therefore adopted an alternative metric, namely the average frequency difference. Mathematically, this is the average of the absolute differences between the frequency of a precursor letter and the frequency of the equally ranked Voynich glyph. My idea was that the lowest average frequency difference should represent the best fit between a transliteration and a presumed precursor language

On this metric, I found that the transliteration which I had numbered v171 was the best fit for ibn Kathir's Arabic alphabet. Apart from the treatment of {4o}, the v171 transliteration has the following differences from v101:
  • m=IN
  • M=iIN
  • n=iN.
Below is a juxtaposition of the frequencies of the top 43 glyphs in the v171 transliteration, and the 43 Arabic letter frequencies. The average frequency difference between v171 and Ibn Kathir's Arabic is 0.64 percent. 

[attachment=8273]

The next step is to explore the potential of these juxtapositions as correspondences or mappings. For example, the Voynich {o} could map to and from the Arabic ا (alef). Thereby, we could map some of the most common Voynich "words", such as {8am}, {oe} and {1oe}, to text strings in Arabic. We could then search appropriate corpora of the Arabic language, for example ibn Kathir's The Beginning and the End, to determine whether these strings are real words.

Since Arabic uses an abjad script, in which the short vowels are not written, chances are that most of the Voynich "words" up to three glyphs will map to real words in Arabic. However, as with Persian, the mapping may well break down with "words" of four glyphs or longer. Even if we are able construct real words of four letters or more, when arranged in sequence they may or may not make sense. I will do some tests. More later.
Further thoughts on medieval Arabic as a precursor language of the Voynich manuscript.

You are not allowed to view links. Register or Login to view.

[attachment=8764]
The frequencies of the ten most common glyphs in the Voynich manuscript, v101④ transliteration, "herbal" section, and the ten most common letters in Dr Dilworth Parkinson’s corpora of 8th to 15th century Arabic, and Mohsen Madi’s corpus of (mainly) fourteenth-century Arabic. Author's analysis.
Did you check your results with Fletcher Crowe's take on Voynich Arabic Manuscript.

You are not allowed to view links. Register or Login to view.

[Image: Equivalencies-of-Voynich-Manuscript-Char...etters.png]
Dear Scarecrow, 

While writing Voynich Reconsidered, I did indeed come across Mr Crowe’s proposed mapping between Arabic letters and Voynich glyphs. I have copied his mappings into an Excel spreadsheet and added counts and frequencies from the following sources:

• by courtesy of Dr Dilworth Parkinson of Brigham Young University: counts of Arabic letters in three corpora of premodern Arabic, as follows:
* the Grammarians corpus, dating from the 8th through 13th centuries, with 2,537,462 letters;
* the Medieval Philosophy and Science corpus, dating from the 9th through 15th centuries, with 4,554,954 letters;
* the Thousand and One Nights (أَلْفُ لَيْلَةٍ وَلَيْلَةٌ), first referenced in Arabic in the 12th century, with 2,326,696 letters.
• counts of Voynich glyphs from Glen Claston’s v101 transliteration; wherein, in some cases I included variants of the glyphs specified by Mr Crowe: for example, I assumed that he intended the v101 {8} to include the visually similar {6}, {7} and {&}.

My first observation was that in most cases, the frequencies of the Arabic letters were greatly different from those of their proposed Voynich equivalents. Just two examples:

• The Arabic letter alef, in its four variants ا إ ٲ ٱ, is the most common letter in the premodern corpora, with a frequency of 13.5 percent. The three Voynich glyphs proposed by Mr Crowe as equivalent, namely the v101 {e}, {s} (if I read it correctly), and {N}, have a combined frequency of 8.5 percent.
• The Arabic letter ha ه has a frequency of 4.9 percent in the premodern corpora. The proposed Voynich equivalents, {&} (if I read it correctly) and {o}, have a combined frequency of 15.8 percent.

Below is an extract from my comparisons of frequencies.

[attachment=8763]
The ten most common Arabic letters in three premodern corpora hosted by Brigham Young University; and the equivalent Voynich glyphs as proposed by Fletcher Crowe. Author’s analysis.

To my mind, the proposed mapping implies an underlying text or texts with profoundly different letter frequencies from those in the major premodern corpora. It is as if, in the English language, we were to encounter a text with a noticeable shortage of the letters e, t, a and o, and a proliferation of letters such as q, w, k and x.

I noted also that the Arabic letter ghain غ was proposed to map the v101 glyph {4}. In the Voynich manuscript, {4} is followed in 96 percent of its occurrences by {o}. In the proposed mapping, the Arabic ha ه and Hah ح are both proposed to correspond to {o}. The implication is that غ must almost always be followed by ه or ح. But this is not so. In the premodern corpora, the letter غ occurs 135,949 times; in only 1,246 cases is it followed by ه, and in only one case by ح.

Finally, I attempted to apply Mr Crowe’s mapping to the v101 “word” {8am}. This is the most frequent “word” in the Voynich manuscript, with 739 occurrences in the v101 transliteration (1.82 percent of the total "word" count). As far as I could determine, Mr Crowe did not recognise {m} as a distinct glyph, and I assumed that he read it as {IN}. This assumption yielded the following interpretations of {8am}:
  • {8} or its variants + {a} + {I} + {N}, 
  • or {8a} or its variants + {I} + {N}.

The proposed mappings of these interpretations of {8am} were as follows, with their counts and frequencies as words in the premodern corpora (here I enlarged the word search to include three additional corpora, namely the Holy Quran, the Hadith and the Adab literature):
  • دسما 83 occurrences (0.0009 percent)
  • دشما no occurrences
  • دصما no occurrences
  • ضما 26 occurrences (0.0004 percent).

In summary, none of the proposed mappings of {8am} yielded a word approaching the expected frequency in medieval Arabic.