The Voynich Ninja

Pages: 1 2 3 4

As members of this forum know, last year I spent a considerable amount of effort arguing for a Judaeo-Greek interpretation of the Voynich ms, according to which a Greek-speaking Jewish person wrote the text in a similar manner as Greek Jews wrote their language using the Hebrew alphabet. I argued that the unusual features and patterns of characters in the ms could possibly be connected to the similarly unusual manner in which the Greek language was thus expressed with the Hebrew script. The fundamental problem with the hypothesis was the potential ambiguity of the phonetic values of many characters, creating ambiguity in the possible readings of many words. It was this obstacle that made it difficult to convince most people on this forum of the validity of my theory.

I always come back to the same problem: When one looks at the restrictions on character patterns, the small number of truly distinct characters occurring with any significant frequency in the ms text, and the well-known extremely low entropy of the text (the character patterns and sequences are too predictable), it is difficult to identify a language that could satisfy all of these restrictions in its writing system. I believe it was Rene who identified Hawaiian, with its extremely small and simple letter and sound inventory, as the natural language with entropy statistics closest to those of the Voynich ms. Very few languages spoken and known in Europe in the late medieval period would have resembled Hawaiian in this way.

Recently I stumbled quite by accident (in the course of researching a completely separate linguistic project) upon a description of the sound system of ancient "Pre-Basque", from which modern Basque dialects are descended. (Please note: I feel it necessary to state upfront and for the record here that I do NOT believe Basque is related to any other known language. It is a true language isolate. My other project was NOT about relating Basque to any other language or family.) I was struck by the simplicity of the Pre-Basque sound system:

Only 6 consonants may occur in word-initial position: *b, *g, *z, *s, *l, *n
*m did not exist, and *p was marginal
*t, *k, *tz, *ts contrasted with *d, *g, *z, *s only in intervocalic medial position
*r occurred only in medial and final position
The vowel system was a simple *a, *e, *i, *o, *u system

Well this is not quite Hawaiian, but it is a lot simpler than most European or Middle Eastern languages spoken and known in late medieval Europe. I would argue it is even simpler than the system I analyzed and proposed for written Judaeo-Greek. One could express the sounds and words more or less distinctively with a rather small and restricted character set, perhaps as small and restricted as that of the Voynich ms.

I should note that one Basque scholar I asked has already expressed skepticism, noting that by the late medieval period already Basque dialects had a consonant inventory as large as 18-20 phonemes, quite different from that of Pre-Basque described above. He found a provisional trial rendering of a few lines of Voynich ms text as Basque not to be intelligible. So this skepticism of one scholar should be duly noted.

However, I observe that the first published Basque text, Bernard Etxepare's Linguae Vasconum Primitiae (1545), in its original spelling used the Latin letter "c" to express both Basque "k" and some instances of modern Basque "z" (just as English uses "c" for the sounds "k" and "s"), and it only needed to use 10 consonant letters with any significant frequency:
r, c, n, d, g, t, s, b, l, z
(m, p less frequently, f, q very rarely)
Moreover, strikingly, in one excerpt of this text in its original spelling that I examined, "n" appears almost exclusively either in word-final position or in the coda position of a syllable, before a consonant that begins the following syllable. This would be a striking correspondence with the restricted occurrence of the Voynich ms character also transcribed as [n] in the EVA transcription.

I am curious to know about any previous theories of the Voynich ms as a text written in Basque. I would have expected this to be a relatively popular hypothesis as far as these things go: Many researchers (read: many well-respected analysts including some on this forum who disagree strenuously with any Latin, Romance, or Greek theory including mine) argue that the linguistic structure of the Voynich ms text is unlikely to represent an Indo-European language; it is fairly clear that the ms is of European provenance; and Basque is one of the very few non-Indo-European languages that was spoken indigenously in Europe in the late medieval period.

However, Internet searches turn up surprisingly few references to any previous Basque theories of the Voynich ms as far as I can tell. It is true that we apparently have no extant manuscript length written Basque texts before the mid-16th century publication mentioned above. But some Basque was written in the late medieval period, so we know there were literate Basque speakers. In a certain way the lack of a standard literary form of written Basque in this period could be an argument in favor of the theory: It would have given the author of the ms more license to develop the writing system in the ms more freely in an idiosyncratic style, without constraint or influence of a literary standard. Surely there were Basque speakers in Europe in that period who were literate in Latin, and who would have had the ability to compose such a ms in Basque.

I suspect the main reason there are few Basque theories of the Voynich ms is that few non-Basques can speak or read Basque! It seems to me that even among Europeans with an interest in unusual or less commonly studied languages, the idea of studying Basque may be more popular than the actual studying of Basque. It is naturally quite difficult to learn after all, and it cannot be related to any other known languages, so there is no natural linguistic link or gateway into studying Basque. It is a linguistic field of study sui generis. Few people would be well qualified to investigate, analyze, and evaluate such a theory.

For the record, here is a very rough example of how one may formulate a very provisional proposal to represent many of the most frequent and distinct sounds of Basque with the Voynich ms characters. I caution that at this stage such a proposal is very unlikely to be completely correct, and I further caution that as noted above, one Basque scholar has already rejected a possible interpretation of a few lines of the ms according to this schema. But it is a starting point and an example of one possible such system.

EVA [a] = Basque "a"
EVA [ai] = Basque "e"
EVA [e] = Basque "i"
EVA [o] = Basque "u"/"o"
EVA [n] = Basque "n"
EVA [k]/[f] = Basque "d"/"t" (always "d" as initial)
EVA [t]/[p] = Basque "b" (Basque "p" is rare/marginal in any case)
EVA [d] = Basque "c" in Etxepare 1545 (either "z" or "k"/"g" in modern Basque - see above)
EVA [ch] = Basque "h" (This phoneme is not even considered for alphabetization in Basque etymological dictionaries)
EVA [ckh]/[cfh] = Basque "t"/"d" (in principle "t", [ch] representing aspiration, but likely not so clear in the actual ms)
EVA [cth]/[cph] = Basque "p"/"b" (in principle "p", [ch] representing aspiration, but likely not so clear in the actual ms)
EVA [s] = Basque "z"
EVA [sh] = Basque "s"/"z" (in principle "s", [ch] representing voiceless articulation, but likely not so clear in the actual ms)
EVA [l] = Basque "l"
EVA [r] = Basque "r"
EVA [y] = abbreviation for some word endings and prefixes, in the style of the similar-looking medieval Latin ms abbreviation

As any experienced Voynich ms researcher might guess, this schema will work out much more messily in practice as a reading of the actual ms, than it appears in this more or less neat correspondence table. I had to shoehorn Basque "g" into EVA [d] along with the "k" and "z" that Etxepare represented with "c" in 1545. [y] is a big fudge factor in the above schema, as it can represent an abbreviation for any ending or prefix as it stands now. Finally, to make any sense of many words, it would be necessary to add a rule that many final vowels are simply unwritten in this system. This is possible in rudimentary writing systems, but it is another fudge factor.

Of course I have my reasons for making the educated guesses about the letter/sound values of the characters above, but obviously everything must be considered provisional at this stage (understatement, I know).

I have chosen EVA [d] to represent Basque "c" and thus often "z" because among the very most frequent Basque words are the 3rd person past tense 'to be'/'to have'/auxiliary verb forms "zen" ('was'), "ziren" ('were'), "zuen" ('had'), and "zituen" ('they had'), and the participial form "izan" ('being', 'having') is very frequent as well. It should be noted that such an auxiliary verb form must be used in almost any Basque clause with a finite verb -- most lexical verbs themselves do not conjugate as Indo-European verbs do.

If EVA [d] is "z" and EVA [n] is "n", then this means that the most frequent Voynich ms word [daiin] could possibly represent the Basque word "zen" ('was'), the 3rd most frequent word in Basque corpora. ("zuen" is 4th, and "izan" is 5th.)

The most frequent Basque word "eta" ('and') could possibly be accounted for by the well-known interpretation of the Voynich ms character EVA [q] as the equivalent of an ampersand.

It is more difficult to explain the 2nd most frequent Basque word "da" ('is'), the 3rd person singular present tense 'to be'/auxiliary verb form. Perhaps this might possibly imply that the Voynich ms is primarily written in the past tense rather than in the present tense. This may strike us as rather implausible at first consideration, at least for the herbal, pharmaceutical, and recipes sections. The verb forms in the recipes section, however, may be primarily imperative forms rather than indicative present tense forms.

I have often thought that the repetitive and line-based patterns of much of the Voynich ms text may be more plausibly interpreted as poetry rather than prose. (I note for the record that Etxepare's Basque publication in 1545 was a collection of poems.) If the "herbal" section is actually poetry inspired by plants, rather than an encyclopedic type of herbal manual, then perhaps a large part of this poetry is written in the past tense. Alternatively, perhaps the Basque participial form "izan" ('being', 'having') features more prominently in the herbal and pharmaceutical sections than do present tense forms.

The other very frequent Voynich ms word [chedy], found most frequently in the balneological and recipes sections, might possibly represent the Basque word "hik", the ergative form of the familiar 2nd person singular pronoun 'you'. It would be logical if this word were frequent in these sections -- in dialogue in the balneological section, and in instructions in the recipes section -- but much less frequent in the herbal section. Common variations of this word could possibly represent similar Basque words such as "hi" (absolutive 2.sg. 'you'), "heuk" and "heu" (emphatic 2.sg. ergative and absolutive 'you'). Likewise [shedy] and related words could possibly represent the formal, neuter, and plural variations of this pronoun in Basque that begin with "z" rather than "h".

Let me conclude by repeating and emphasizing the tentative nature of this hypothesis and all of these ideas and comments. I have been wrong about such things before, and I am well aware that I may well be wrong again now. But I find the theory intriguing, and I welcome comments and feedback from members of this forum. Naturally, if we have any Basque speakers on this forum, their feedback and reaction to my ideas would be the most valuable of all.

Geoffrey

Can I ask what evidence within the manuscript itself led you to believe the text is written in Basque?

(13-09-2020, 01:44 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Can I ask what evidence within the manuscript itself lead you to believe the text is written in Basque?

The primary motivation for the hypothesis is the possibility of representing the Basque language in a relatively unambiguous manner with a relatively small and restricted number of characters in a script. In my view this is and always has been the primary obstacle to identifying any even remotely plausible hypotheses about the possible language of the Voynich ms, so it is the first issue that needs to be addressed in any hypothesis as far as I am concerned.

Secondarily, I can point to the plausible potential identification of certain extremely frequent Voynich ms words, such as [daiin] and [chedy], as certain likewise very common and basic Basque words, in this case "zen" ('was') and "hik" ('you' singular familiar ergative) respectively.

May I ask what other types of evidence within the ms itself that you would look for to generate a hypothesis about the possible language in which the text is written?

Geoffrey

Some basque - voynich stuff.
JKP's blog
You are not allowed to view links. Register or Login to view.

Gavin Güldenpfennig
You are not allowed to view links. Register or Login to view.

René Zandbergen
You are not allowed to view links. Register or Login to view.

Geoffrey, looking quickly at a text in Basque, I don't see an agreement with Voynich.
[Image: basque.jpg]

Don't do it man. Our problem right now is not "which language", it is "how is whichever language it is transformed into Voynichese". Counterintuitively, the latter needs to be answered first. But this needs more than simple substitution...

This ^^^

(13-09-2020, 02:08 AM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.May I ask what other types of evidence within the ms itself that you would look for to generate a hypothesis about the possible language in which the text is written?

I would like you to demonstrate:

Why a particular glyph should be identified as a vowel.
Why a particular glyph should be a plosive/fricative/liquid/nasal/semivowel.
Why particular words or affixes should be considered specific parts of the grammar.

The idea is that even without a candidate language, you should be able to demonstrate the principles of the underlying language. Starting from internal evidence only I want you to argue how you got to the idea that [r] = /r/ or [l] = /l/. Can you demonstrate that they fit into a class like liquids? That clusters such as [lk] = /lt/ make sense? That [chedy] = /hik/ "you (erg)" makes sense? What is "you" doing in the text? What verbs does it relate to?

Re-interpreting the anomalous character conditional entropy as "small alphabet size" is certainly incorrect, but indeed the concept of conditional entropy is not that easy to grasp (I am still not sure I fully understand it).
Alphabet size has an impact on h1 (character entropy), not much on h2 (conditional character entropy). This is nicely illustrated in You are not allowed to view links. Register or Login to view.'s figure 4: h2 only shows a minimal hint of positive correlation with alphabet size.

Also, the size of the alphabet of Voynichese is not known and different transliteration systems give different answers. But conditional entropy has been shown to be relatively robust and not much above 2, when using different transliterations (see Rene's Table 2 You are not allowed to view links. Register or Login to view.).

On the other hand, European languages (Basque included) have a much higher conditional entropy, close to 3. In the attached graph, I manually added a green star for 4000 words from You are not allowed to view links. Register or Login to view. to one of the plots from You are not allowed to view links. Register or Login to view..
The values I get for LVP are:
h1: 3.89
h2: 3.01

The red triangles closer to Voynichese correspond to Hawaiian and Asian languages like Tagalog and Minjiang.
Rene's original plot already included a Basque sample (one of the yellow diamonds).

(13-09-2020, 12:19 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Re-interpreting the anomalous character conditional entropy as "small alphabet size" is certainly incorrect, but indeed the concept of conditional entropy is not that easy to grasp (I am still not sure I fully understand it).
Alphabet size has an impact on h1 (character entropy), not much on h2 (conditional character entropy). This is nicely illustrated in You are not allowed to view links. Register or Login to view.'s figure 4: h2 only shows a minimal hint of positive correlation with alphabet size.

Also, the size of the alphabet of Voynichese is not known and different transliteration systems give different answers. But conditional entropy has been shown to be relatively robust and not much above 2, when using different transliterations (see Rene's Table 2 You are not allowed to view links. Register or Login to view.).

On the other hand, European languages (Basque included) have a much higher conditional entropy, close to 3. In the attached graph, I manually added a green star for 4000 words from You are not allowed to view links. Register or Login to view. to one of the plots from You are not allowed to view links. Register or Login to view..
The values I get for LVP are:
h1: 3.89
h2: 3.01

The red triangles closer to Voynichese correspond to Hawaiian and Asian languages like Tagalog and Minjiang.
Rene's original plot already included a Basque sample (one of the yellow diamonds).

Marco, thank you for this analysis.

Your source for Linguae Vasconum Primitiae appears to be written in a more modern Basque spelling, not the author's original spelling in 1545. A short sample of the original with a comparison to modern spelling is presented You are not allowed to view links. Register or Login to view.. I am not saying the original spelling will drive down the conditional entropy from 3 to 2, but it is worth noting the actual spelling of the original text.

Pages: 1 2 3 4

geoffreycaveney

Emma May Smith

geoffreycaveney

RobGea

Ruby Novacna

Koen G

DONJCH

Emma May Smith

MarcoP

geoffreycaveney