06-03-2019, 06:12 AM
I am motivated to post this idea in the spirit of Emma May Smith's approach of language and content agnostic analysis, but with the assumptions of linguistic text written in the plain.
I briefly brought up the outline of the idea in the devil's advocate for glossolalia post, but I realize it may have gotten lost in the 100,000 or so other words of text surrounding it. So I would like to put it forward separately and more clearly here:
At first glance one would think it absurd to propose that each Voynich character *bigram* could represent a single letter or phoneme in any language: surely with 15-25 characters, there must be many hundreds of such bigrams, and no language could have that many phonemes. But in fact upon closer inspection, so long as the bigrams are paired off naturally, and certain obvious non-bigrams are excluded (initial [q-], many final [-y]'s, initial [d-] in some cases, etc.), then it becomes apparent that there are not anywhere near hundreds of such bigrams that occur with any frequency beyond the rare or accidental appearance; rather there are only a couple dozen or so of them.
As a simple example, in the vord [otchy], clearly one must not consider the pairs [tc] and [hy] as bigrams! Obviously the bigrams are [ot] and [ch], and [-y] is a single character at the end. In this case it is obvious because we all know [ch] is one unit, not two separate letters or phonemes. Likewise with the notorious [sh], however many different forms of it may occur in the ms text: in any case, we know the [h] cannot be separated from the [s]!
In this spirit, I propose that the following inventory of bigrams constitutes a substantially large majority of the ms text:
[ch], [dy], [ai], [ok], [in], [ol], [ee], [sh], [da], [ey], [ot], [eo], [ar], [al], [or], [od], [yk], [sa], [yt], [os], [do], [so], [ky], [ty], [oy]
Naturally the apparent ligatures [cth] and [ckh] must be accounted for here as well.
As I noted above, certain obvious and frequent non-bigram single characters must be accounted for separately:
many [y]'s, many [d]'s, [q], many [s]'s, an occasional initial [k] and [t], and the odd extraneous [o], [i], or [e].
But I stress that these latter occasional or extraneous characters are very much the infrequent exception in the ms text, not the common rule. Likewise, it still remains to deal with [p] and [f], not to mention [m], [g], and a few others! But they will hardly affect the reading of the vast majority of the ms text.
=====
Further, we can make even more sense of this bigram inventory as a phoneme inventory if we regard certain pairs of bigrams as *the initial and final forms* of the same phoneme. The variance in form of letters in initial and/or medial vs. final position is the absolute rule in the Arabic script, exists for a number of letters in Syriac and Hebrew, is known to many in the case of the Greek letter sigma, and existed until modern times in the English letter "s" (the funny-looking "f" without the bar occurring in initial/medial position).
So, for example, perhaps [ok] is an initial form and [ky] a final form of the same letter/phoneme. Likewise [ot] and [ty]. I note on Emma May Smith's blog the suggestion that [a] and [y] may be equivalent: perhaps then [da] and [dy] are the initial and final forms of a very frequent letter/phoneme? More speculatively, but perhaps usefully, might [ch] and [ey] be the initial and final forms of the same letter/phoneme? Further suggestions for the same phenomenon include the pair [sa] and [ar], and the pair [so] and [or].
With such an inventory, we have now perhaps accounted at least somewhat for the thorny issue of initial vs. final glyphs and sequences, and we still have a decent and reasonably sized inventory of distinct letters/phonemes by this method, not too large and not too small.
=====
I recall that somewhere on René Zandbergen's voynich.nu website, there is the observation that the 3rd character in each vord is much less predictable, and thus contains much more information, than either the 1st or 2nd character. If the text is indeed composed of bigrams, and the initial bigram/letter/phoneme in the language happens to be rather predicable (cf. the Hebrew article prefix h-), then it would indeed make sense that the variation and information and reduced predictability would not occur until the 3rd character.
The bigram theory does introduce the problem of extremely short vords. This would be less of an issue in a Semitic abjad, in which vowels are not written. And we may also consider the idea that each vord may not be a complete word, but only a part of a word, however we may define that.
It is just one theory, in any case. I hope some folks here may find it worth considering and discussing, if not accepting.
-Geoffrey Caveney
I briefly brought up the outline of the idea in the devil's advocate for glossolalia post, but I realize it may have gotten lost in the 100,000 or so other words of text surrounding it. So I would like to put it forward separately and more clearly here:
At first glance one would think it absurd to propose that each Voynich character *bigram* could represent a single letter or phoneme in any language: surely with 15-25 characters, there must be many hundreds of such bigrams, and no language could have that many phonemes. But in fact upon closer inspection, so long as the bigrams are paired off naturally, and certain obvious non-bigrams are excluded (initial [q-], many final [-y]'s, initial [d-] in some cases, etc.), then it becomes apparent that there are not anywhere near hundreds of such bigrams that occur with any frequency beyond the rare or accidental appearance; rather there are only a couple dozen or so of them.
As a simple example, in the vord [otchy], clearly one must not consider the pairs [tc] and [hy] as bigrams! Obviously the bigrams are [ot] and [ch], and [-y] is a single character at the end. In this case it is obvious because we all know [ch] is one unit, not two separate letters or phonemes. Likewise with the notorious [sh], however many different forms of it may occur in the ms text: in any case, we know the [h] cannot be separated from the [s]!
In this spirit, I propose that the following inventory of bigrams constitutes a substantially large majority of the ms text:
[ch], [dy], [ai], [ok], [in], [ol], [ee], [sh], [da], [ey], [ot], [eo], [ar], [al], [or], [od], [yk], [sa], [yt], [os], [do], [so], [ky], [ty], [oy]
Naturally the apparent ligatures [cth] and [ckh] must be accounted for here as well.
As I noted above, certain obvious and frequent non-bigram single characters must be accounted for separately:
many [y]'s, many [d]'s, [q], many [s]'s, an occasional initial [k] and [t], and the odd extraneous [o], [i], or [e].
But I stress that these latter occasional or extraneous characters are very much the infrequent exception in the ms text, not the common rule. Likewise, it still remains to deal with [p] and [f], not to mention [m], [g], and a few others! But they will hardly affect the reading of the vast majority of the ms text.
=====
Further, we can make even more sense of this bigram inventory as a phoneme inventory if we regard certain pairs of bigrams as *the initial and final forms* of the same phoneme. The variance in form of letters in initial and/or medial vs. final position is the absolute rule in the Arabic script, exists for a number of letters in Syriac and Hebrew, is known to many in the case of the Greek letter sigma, and existed until modern times in the English letter "s" (the funny-looking "f" without the bar occurring in initial/medial position).
So, for example, perhaps [ok] is an initial form and [ky] a final form of the same letter/phoneme. Likewise [ot] and [ty]. I note on Emma May Smith's blog the suggestion that [a] and [y] may be equivalent: perhaps then [da] and [dy] are the initial and final forms of a very frequent letter/phoneme? More speculatively, but perhaps usefully, might [ch] and [ey] be the initial and final forms of the same letter/phoneme? Further suggestions for the same phenomenon include the pair [sa] and [ar], and the pair [so] and [or].
With such an inventory, we have now perhaps accounted at least somewhat for the thorny issue of initial vs. final glyphs and sequences, and we still have a decent and reasonably sized inventory of distinct letters/phonemes by this method, not too large and not too small.
=====
I recall that somewhere on René Zandbergen's voynich.nu website, there is the observation that the 3rd character in each vord is much less predictable, and thus contains much more information, than either the 1st or 2nd character. If the text is indeed composed of bigrams, and the initial bigram/letter/phoneme in the language happens to be rather predicable (cf. the Hebrew article prefix h-), then it would indeed make sense that the variation and information and reduced predictability would not occur until the 3rd character.
The bigram theory does introduce the problem of extremely short vords. This would be less of an issue in a Semitic abjad, in which vowels are not written. And we may also consider the idea that each vord may not be a complete word, but only a part of a word, however we may define that.
It is just one theory, in any case. I hope some folks here may find it worth considering and discussing, if not accepting.
-Geoffrey Caveney