16-03-2024, 04:28 PM
I've been continuing to play around with the idea that some of the strange properties of Voynichese could be explained in terms of the workings of a syllabic encoding scheme. I decided to try writing up my latest version of this "syllabic hypothesis," not proposing any specific solution (I don't have one), but just outlining the general kind of mechanism I can imagine having been in play. So here goes, with no claim that it's anything more than the usual stab in the dark:
A “vord” ordinarily corresponds to a syllable, and breaks between “vords” correspond to boundaries between syllables. These breaks can help the reader parse the text into pronounceable chunks, but otherwise they’re redundant and expendable. We can draw an analogy with numbers represented with Arabic numerals, insofar as “21734” and “21,734” (or “21.734”) mean the same thing.
Breaks are most practically useful in running “paragraphic” text, where many syllables appear consecutively, for the same reason that punctuation is more useful in longer Arabic numerals: “1934” or “21734” are easy to read without punctuation, but “3478923478923” isn’t. On the other hand, it’s less crucial to introduce breaks into shorter “labels,” which tend therefore to have their syllables less carefully separated and to yield longer “vords” on average.
A “vord” can represent a syllable as V, CV, VC, or CVC, where C can be a consonant cluster and V can be a diphthong. Most multisyllabic plaintext words can accordingly be divided into syllables in multiple ways. Moreover, a single syllabic “vord” can span two plaintext words, or even three (for example, if just a plan were encoded as [jus] [tap] [lan]). To avoid confusion, breaks between plaintext words can be marked explicitly within a syllabic “vord,” for example as [t·a·p], but this practice is optional and inconsistent, just as it was in other writings of the fifteenth century. The mechanism for encoding [·] could also overlap with a mechanism for showing emphasis, comparable to the use of majuscules.
A single plaintext word is never allowed to extend across a line break, which has implications for the forms of syllabic "vords" we'll find at the beginnings and ends of lines.
Some plaintext words end in such a way that their final syllables will almost always end up shared in a single "vord" with the beginning of the following word in running text. It’s only when one of these words appears at the end of a line that we’ll encounter a "vord" that represents this type of word-ending syllable in isolation.
Similarly, some plaintext words begin in such a way that their opening syllables will almost always end up shared in a single "vord" with the end of the preceding word in running text. It’s only when one of these words appears at the beginning of a line that we’ll encounter a "vord" that represents this type of word-opening syllable in isolation.
Consonant clusters that occur only at the intersections between words in running text will never be found at the beginnings of lines. Take for example a mechanism for encoding double letters, as in est tua = [es] [t·tu] [a]. Syllables of the form [t·tu] can appear within lines but never line-initially.
A “vord” can represent a syllable of the form V, CV, VC, or CVC, but the mapping of characters to phonemes within it isn’t necessarily straightforward. An empty slot might be marked, e.g., [0V0], [CV0], [0VC], to help with parsing. Different glyphs might be used to encode the same consonant initially as CV and terminally as VC. There might be some consonants or consonant clusters that can only be encoded as CV or VC – for instance, maybe [x] can only be encoded as such at the end of a syllable. And encoding might be verbose in any number of unintuitive ways, leading a "vord" that represents a single syllable to look superficially multisyllabic.
When a plaintext word is broken into syllables for encoding, there may be a loose tendency for successive syllables to display the same structure, e.g., e civitatis = [e] [ci] [vi] [ta] [tis], consistently favoring CV, or [e·c] [iv] [it] [at] [is], consistently favoring VC. Combined with the marking of empty slots, this would result in a strong tendency towards repetition of similar-looking forms, e.g., [0e0] [ci0] [vi0] [ta0] [tis] or [0e·c] [0iv] [0it] [0at] [0is].
But in some cases, it's legitimately ambiguous what "counts" as a syllable. For example, is ia one syllable or two? This type of situation may have been handled inconsistently or in a deliberately ambiguous way, and could partially scramble some of the foregoing pattern.
Even if a given plaintext word is unlikely to be written twice in exactly the same way, plaintext words are made up of consistent syllables, such that if the same plaintext word recurs repeatedly in a passage, a “vord” that can be used to represent one of its syllables is likely to recur there as well—as are similar-looking “vords” that represent its combination with adjacent parts of the same word or with other adjacent plaintext words.
A writer might have favored some particular syllable structure, such as CV, when starting a line, and only switched to a different syllable structure, such as VC, when forced by an uncooperative word to do so, but then stuck with it, leading the dispreferred form to favor the latter part of lines, only slightly but consistently.
A syllabic “vord” will tend to be followed preferentially by syllabic “vords” that start in phonetically compatible ways. Thus, a syllabic “vord” ending in [m] is more likely to be followed by another syllabic “vord” beginning with [b] than by one beginning with [d] or [g] if mb occurs more often within plaintext words than md or mg. (To be clear, I'm using plaintext Latin characters to represent themselves here, and not EVA!)
Over time, a system like this would probably have been called upon to handle unanticipated situations. For example, it may at first have made no provision for encoding consonants without vowels, since that violates its basic syllabic logic. But then maybe a need arose to encode Roman numerals or unusually complex consonant clusters—or maybe the original approach to encoding consonant clusters just turned out to be too clunky. The problem could have been solved by permitting the vowel slot to be marked as empty [0] – which would have required introducing some new and distinctive glyph or glyph combination to serve this purpose, and would incidentally also have offered a lot of new options for encoding consonant clusters. The result might have ended up looking like a different “language” entirely. In the absence of any content, encoding could then also have defaulted to [000] [000] [000] if needed purely to fill space.
A “vord” ordinarily corresponds to a syllable, and breaks between “vords” correspond to boundaries between syllables. These breaks can help the reader parse the text into pronounceable chunks, but otherwise they’re redundant and expendable. We can draw an analogy with numbers represented with Arabic numerals, insofar as “21734” and “21,734” (or “21.734”) mean the same thing.
Breaks are most practically useful in running “paragraphic” text, where many syllables appear consecutively, for the same reason that punctuation is more useful in longer Arabic numerals: “1934” or “21734” are easy to read without punctuation, but “3478923478923” isn’t. On the other hand, it’s less crucial to introduce breaks into shorter “labels,” which tend therefore to have their syllables less carefully separated and to yield longer “vords” on average.
A “vord” can represent a syllable as V, CV, VC, or CVC, where C can be a consonant cluster and V can be a diphthong. Most multisyllabic plaintext words can accordingly be divided into syllables in multiple ways. Moreover, a single syllabic “vord” can span two plaintext words, or even three (for example, if just a plan were encoded as [jus] [tap] [lan]). To avoid confusion, breaks between plaintext words can be marked explicitly within a syllabic “vord,” for example as [t·a·p], but this practice is optional and inconsistent, just as it was in other writings of the fifteenth century. The mechanism for encoding [·] could also overlap with a mechanism for showing emphasis, comparable to the use of majuscules.
A single plaintext word is never allowed to extend across a line break, which has implications for the forms of syllabic "vords" we'll find at the beginnings and ends of lines.
Some plaintext words end in such a way that their final syllables will almost always end up shared in a single "vord" with the beginning of the following word in running text. It’s only when one of these words appears at the end of a line that we’ll encounter a "vord" that represents this type of word-ending syllable in isolation.
Similarly, some plaintext words begin in such a way that their opening syllables will almost always end up shared in a single "vord" with the end of the preceding word in running text. It’s only when one of these words appears at the beginning of a line that we’ll encounter a "vord" that represents this type of word-opening syllable in isolation.
Consonant clusters that occur only at the intersections between words in running text will never be found at the beginnings of lines. Take for example a mechanism for encoding double letters, as in est tua = [es] [t·tu] [a]. Syllables of the form [t·tu] can appear within lines but never line-initially.
A “vord” can represent a syllable of the form V, CV, VC, or CVC, but the mapping of characters to phonemes within it isn’t necessarily straightforward. An empty slot might be marked, e.g., [0V0], [CV0], [0VC], to help with parsing. Different glyphs might be used to encode the same consonant initially as CV and terminally as VC. There might be some consonants or consonant clusters that can only be encoded as CV or VC – for instance, maybe [x] can only be encoded as such at the end of a syllable. And encoding might be verbose in any number of unintuitive ways, leading a "vord" that represents a single syllable to look superficially multisyllabic.
When a plaintext word is broken into syllables for encoding, there may be a loose tendency for successive syllables to display the same structure, e.g., e civitatis = [e] [ci] [vi] [ta] [tis], consistently favoring CV, or [e·c] [iv] [it] [at] [is], consistently favoring VC. Combined with the marking of empty slots, this would result in a strong tendency towards repetition of similar-looking forms, e.g., [0e0] [ci0] [vi0] [ta0] [tis] or [0e·c] [0iv] [0it] [0at] [0is].
But in some cases, it's legitimately ambiguous what "counts" as a syllable. For example, is ia one syllable or two? This type of situation may have been handled inconsistently or in a deliberately ambiguous way, and could partially scramble some of the foregoing pattern.
Even if a given plaintext word is unlikely to be written twice in exactly the same way, plaintext words are made up of consistent syllables, such that if the same plaintext word recurs repeatedly in a passage, a “vord” that can be used to represent one of its syllables is likely to recur there as well—as are similar-looking “vords” that represent its combination with adjacent parts of the same word or with other adjacent plaintext words.
A writer might have favored some particular syllable structure, such as CV, when starting a line, and only switched to a different syllable structure, such as VC, when forced by an uncooperative word to do so, but then stuck with it, leading the dispreferred form to favor the latter part of lines, only slightly but consistently.
A syllabic “vord” will tend to be followed preferentially by syllabic “vords” that start in phonetically compatible ways. Thus, a syllabic “vord” ending in [m] is more likely to be followed by another syllabic “vord” beginning with [b] than by one beginning with [d] or [g] if mb occurs more often within plaintext words than md or mg. (To be clear, I'm using plaintext Latin characters to represent themselves here, and not EVA!)
Over time, a system like this would probably have been called upon to handle unanticipated situations. For example, it may at first have made no provision for encoding consonants without vowels, since that violates its basic syllabic logic. But then maybe a need arose to encode Roman numerals or unusually complex consonant clusters—or maybe the original approach to encoding consonant clusters just turned out to be too clunky. The problem could have been solved by permitting the vowel slot to be marked as empty [0] – which would have required introducing some new and distinctive glyph or glyph combination to serve this purpose, and would incidentally also have offered a lot of new options for encoding consonant clusters. The result might have ended up looking like a different “language” entirely. In the absence of any content, encoding could then also have defaulted to [000] [000] [000] if needed purely to fill space.