The Voynich Ninja

I've been continuing to play around with the idea that some of the strange properties of Voynichese could be explained in terms of the workings of a syllabic encoding scheme. I decided to try writing up my latest version of this "syllabic hypothesis," not proposing any specific solution (I don't have one), but just outlining the general kind of mechanism I can imagine having been in play. So here goes, with no claim that it's anything more than the usual stab in the dark:

A “vord” ordinarily corresponds to a syllable, and breaks between “vords” correspond to boundaries between syllables. These breaks can help the reader parse the text into pronounceable chunks, but otherwise they’re redundant and expendable. We can draw an analogy with numbers represented with Arabic numerals, insofar as “21734” and “21,734” (or “21.734”) mean the same thing.

Breaks are most practically useful in running “paragraphic” text, where many syllables appear consecutively, for the same reason that punctuation is more useful in longer Arabic numerals: “1934” or “21734” are easy to read without punctuation, but “3478923478923” isn’t. On the other hand, it’s less crucial to introduce breaks into shorter “labels,” which tend therefore to have their syllables less carefully separated and to yield longer “vords” on average.

A “vord” can represent a syllable as V, CV, VC, or CVC, where C can be a consonant cluster and V can be a diphthong. Most multisyllabic plaintext words can accordingly be divided into syllables in multiple ways. Moreover, a single syllabic “vord” can span two plaintext words, or even three (for example, if just a plan were encoded as [jus] [tap] [lan]). To avoid confusion, breaks between plaintext words can be marked explicitly within a syllabic “vord,” for example as [t·a·p], but this practice is optional and inconsistent, just as it was in other writings of the fifteenth century. The mechanism for encoding [·] could also overlap with a mechanism for showing emphasis, comparable to the use of majuscules.

A single plaintext word is never allowed to extend across a line break, which has implications for the forms of syllabic "vords" we'll find at the beginnings and ends of lines.

Some plaintext words end in such a way that their final syllables will almost always end up shared in a single "vord" with the beginning of the following word in running text. It’s only when one of these words appears at the end of a line that we’ll encounter a "vord" that represents this type of word-ending syllable in isolation.

Similarly, some plaintext words begin in such a way that their opening syllables will almost always end up shared in a single "vord" with the end of the preceding word in running text. It’s only when one of these words appears at the beginning of a line that we’ll encounter a "vord" that represents this type of word-opening syllable in isolation.

Consonant clusters that occur only at the intersections between words in running text will never be found at the beginnings of lines. Take for example a mechanism for encoding double letters, as in est tua = [es] [t·tu] [a]. Syllables of the form [t·tu] can appear within lines but never line-initially.

A “vord” can represent a syllable of the form V, CV, VC, or CVC, but the mapping of characters to phonemes within it isn’t necessarily straightforward. An empty slot might be marked, e.g., [0V0], [CV0], [0VC], to help with parsing. Different glyphs might be used to encode the same consonant initially as CV and terminally as VC. There might be some consonants or consonant clusters that can only be encoded as CV or VC – for instance, maybe [x] can only be encoded as such at the end of a syllable. And encoding might be verbose in any number of unintuitive ways, leading a "vord" that represents a single syllable to look superficially multisyllabic.

When a plaintext word is broken into syllables for encoding, there may be a loose tendency for successive syllables to display the same structure, e.g., e civitatis = [e] [ci] [vi] [ta] [tis], consistently favoring CV, or [e·c] [iv] [it] [at] [is], consistently favoring VC. Combined with the marking of empty slots, this would result in a strong tendency towards repetition of similar-looking forms, e.g., [0e0] [ci0] [vi0] [ta0] [tis] or [0e·c] [0iv] [0it] [0at] [0is].

But in some cases, it's legitimately ambiguous what "counts" as a syllable. For example, is ia one syllable or two? This type of situation may have been handled inconsistently or in a deliberately ambiguous way, and could partially scramble some of the foregoing pattern.

Even if a given plaintext word is unlikely to be written twice in exactly the same way, plaintext words are made up of consistent syllables, such that if the same plaintext word recurs repeatedly in a passage, a “vord” that can be used to represent one of its syllables is likely to recur there as well—as are similar-looking “vords” that represent its combination with adjacent parts of the same word or with other adjacent plaintext words.

A writer might have favored some particular syllable structure, such as CV, when starting a line, and only switched to a different syllable structure, such as VC, when forced by an uncooperative word to do so, but then stuck with it, leading the dispreferred form to favor the latter part of lines, only slightly but consistently.

A syllabic “vord” will tend to be followed preferentially by syllabic “vords” that start in phonetically compatible ways. Thus, a syllabic “vord” ending in [m] is more likely to be followed by another syllabic “vord” beginning with [b] than by one beginning with [d] or [g] if mb occurs more often within plaintext words than md or mg. (To be clear, I'm using plaintext Latin characters to represent themselves here, and not EVA!)

Over time, a system like this would probably have been called upon to handle unanticipated situations. For example, it may at first have made no provision for encoding consonants without vowels, since that violates its basic syllabic logic. But then maybe a need arose to encode Roman numerals or unusually complex consonant clusters—or maybe the original approach to encoding consonant clusters just turned out to be too clunky. The problem could have been solved by permitting the vowel slot to be marked as empty [0] – which would have required introducing some new and distinctive glyph or glyph combination to serve this purpose, and would incidentally also have offered a lot of new options for encoding consonant clusters. The result might have ended up looking like a different “language” entirely. In the absence of any content, encoding could then also have defaulted to [000] [000] [000] if needed purely to fill space.

A pro of this theory is that it would account for the quite rigid word structure, with Voynich words having a seemingly small inventory of "parts" as well as highly positional elements. This is reasonably similar to what we would see in a phonemic inventory and as the output of phonotactic rules.

However, there are several cons:

Every word (or most) must contain a vowel, which means that a there will be a single set of glyphs which occurs in most words. This is actually true, but these are the single glyphs [o, y] rather than a set of glyph groups. Moreover, these same glyphs are also those which can appear 1) multiple times in a word, and 2) discontinuously. A syllable cannot have more than one nucleus and the whole nucleus needs to be continuous.
Although glyphs representing vowels could occur at the start, midlde, or end of words, the longer the word then the more likely than the vowel is represented by the middle glyphs (there is a strong restriction on the maximum complexity of consonant clusters in the onset and code). These middle glyphs are exactly where we see the most variety of glyphs in Voynich words, with lower variety to either side. Think of a word such as [qokeedy]: both [qo] and [dy] are common in those positions across
many words, yet [kee] can be replaced with some higher range of potential glyph groups. Yet phonemic inventories typically have 3 to 5 times more consonants than vowels.
This is especially true for consonants before vowels rather than after. Languages tend to show a greater range of consonants before a vowel than after, and allow more complexity in that position. This is opposite in the Voynich text where the range of glyphs which can come after gallows (which are likely to be in the middle of syllables) has more variety than those which come before.
Solving this problem by treating gallows as part of the consonant in the onset rather than the vowel would run into the problem in the first point. You would run into words like [qokody], [qokol], [qokor] and many others where the vowel would need to be [o], yet this has to be ruled out.

I hope these points are helpful. I'm sorry I don't have a counter-theory to offer. It's clear that the structure of Voynich words is a problem (well, parts of the structure, not all), but I cannot think what might be behind it.

(16-03-2024, 06:30 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
Every word (or most) must contain a vowel, which means that a there will be a single set of glyphs which occurs in most words. This is actually true, but these are the single glyphs [o, y] rather than a set of glyph groups. Moreover, these same glyphs are also those which can appear 1) multiple times in a word, and 2) discontinuously. A syllable cannot have more than one nucleus and the whole nucleus needs to be continuous.

Thanks for this and for your other good points. You've laid out an objection we might call the "vowel problem" very well and very usefully. I'm not sure it's entirely insoluble, but it's certainly important to address.

I've suggested that each "vord" could represent a V, CV, VC, or CVC structure, but it doesn't necessarily follow that its component glyphs can be neatly divided up into those parts. If we're looking at a cryptographic solution, I suppose the connection between the form of a "vord" and its value could even be wholly arbitrary. But that seems unlikely: the existence of line-start, line-end, and word-break patterns seems to point towards the component parts of a "vord" being individually significant and meaningfully ordered. The hypothesis I've laid out assumes that the leftmost part of a "vord" represents the onset of a syllable and that the rightmost part represents the coda, and if that isn't the case, there wouldn't be much left of the hypothesis itself.

That said, it seems to me that the encoding of the nucleus could conceivably take many different forms.

One advantage of a syllabic encoding system with different encodings for onset and coda is that the structure of a syllable would be easy to determine even if its vowel weren't explicitly written. Whether anyone in the fifteenth century would have recognized this advantage or not is an open question, but a number of nineteenth-century shorthand systems tried to make use of it to a greater or lesser extent. Here, for example, is a passage from the introduction to Alexander Melville Bell's Steno-Phonography (1852):

Quote:It is confidently asserted that the principle of articulate notation, which forms a distinctive feature in this Steno-phonography, is a source of such absolute accuracy, simplicity, and brevity, as along place this system above any that has yet appeared. ARTICULATIONS (or Consonants) ARE WRITTEN FULL SIZE ONLY WHEN A VOWEL PRECEDES THEM. This principle is perfect in its analogy to speech. And as it informs the eye with exactitude where vowels do, and where they do not occur, it renders the writing of vowels, except when final or double, altogether unnecessary in ordinary Short-hand. It gives such a distinctiveness of outline to almost every word, that without vowel marks, and independently of context, the mind is enabled to fix at once on the precise word intended, without the memory being burdened with lists of arbitrary logograms.

To give a concrete example, [snt] could be read any number of ways, but if we represent the position of a nucleus by *, it's more easily disambiguated. If the language is English, we might have [s*nt] = sent, [sn*t] = snit or snot, [*s*nt] = isn't, [s*nt*] = Santa, Sinti, etc. According to the syllabic hypothesis I described, Voynichese would have enjoyed this same advantage. That's not to say it would necessarily have omitted vowels. But it might have recognized an implicit default vowel, like Brahmic scripts, in which case the majority of "vords" might not contain any glyph representing a vowel, even if some do. Alternatively, CV and/or VC combinations might have been encoded together as units in some way that makes distinct vowel graphemes difficult or impossible to isolate. Some such arrangement could perhaps free up those middle glyphs to modify the onset and/or coda of a syllable rather than requiring them to specify a nucleus.

This would be an added complication, and I'll admit that any hypothesis grows less plausible with each added complication. But if the possibilities I've suggested don't seem too far-fetched, they might help salvage what I consider the more attractive implications of the hypothesis -- the potential explanations for line-start and line-end anomalies, word-break patterns, repetitions of similar-looking words, and so on.

Besides a few one-letter words (y, k, o, s), the VM words are comprised of syllables, which contain one vowel and various number of consonants on one or both sides of the vowel.
There are several reasons why the linguists cannot see it:
-the seeming abjad writing, caused by dropped semi-vowels and unstressed short vowels. This was a standard writing practice when Slavic languages began using Latin letters which did not have equivalents for those Glagolitic letters. This is why ch occurs like a free-standing word or it can be followed by another consonant, while in fact it represents CHE (if, but - as a word, and a frequently used syllable). When CHE is spelled in the text, it indicates different sound of e. Often the seemingly the same word can be spelled with or without e, cush as RCHY and RECHY. In contemporary Slovenian both words are spelled REČI (RECHY). The difference is in the way the words are pronounced: R-CHY (things, words), RECHY (to say). In this particular word, the R also assums the role of a vowel (in the VM it is oftes followed by a space to indicate a pause in pronunciation). In medieval Slovenian, letters R and L were considered full vowels.
-The gauge letters (EVA p, f), transcribed as SV, CV, ZV can also form a syllable, because contain an inherent semi-vowel. When certan suffixes are added, the sound of the semi-vowel changes to full vowel e.
- The use of diphthongs was still used in the medieval Slovenian - DEIL (part).
- Double ii was used for sound ji or ij
- The strings of vowels are still noticable in Slovenian dialectal speech, and since VM is based on phonetic, there are many of those in the text. Example: OOAM. OA is a diphthong, but the first o stands for v. POUEDAL - (he told) - PO-UE-DAL. The diphthongs were resolved in various ways: usually one vowel is dropped or one vewel becomes a consonant: PO-UE-DAL to PO-VE-DAL, DEIL - DEL, BUOM - BOM.

All medieval Slovenian grammar books included a list of syllabled for readers to practice.

First few pages of the text were often divided into syllables for easier pronunciation.
Understanding the syllables in most important for understanding the VM grammatical patterns, because they can help identify the prefixes and suffixes, the sound changes, the proper spaces, spelling, etymology.
Slovenian word-building is called ZLAGANJE, which means 'putting syllables together'. While most basic words are made of one syllable, adding another syllable can form a derivative of that root word, and adding another can indicate grammatical form, adding a syllable in front (a prefix) can also change the root slightly.
There are still some Slovenian words comprised of a syllable where r serves as a vowel, such as SMRT (death), VRT (garden).
This short explanation only touches the surface. After compiling grammatical rules I am now in a process of adjusting VM transcription for proper syllables and spaces. If anybody is willing to explore the VM text from this perspective, I would gladly share my material.

(16-03-2024, 04:28 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.I've been continuing to play around with the idea that some of the strange properties of Voynichese could be explained in terms of the workings of a syllabic encoding scheme. I decided to try writing up my latest version of this "syllabic hypothesis," not proposing any specific solution (I don't have one), but just outlining the general kind of mechanism I can imagine having been in play.

I've been long attracted to the same idea, and I haven't yet given up on it.
This resulted in an informal 'paper' which is at academia.edu, and which I believe I have also summarised here.

I 'cheated' in a sense, in that I used a plain text in Italian rather than Latin. Italian has a much cleaner split of the text in vowels and consonants, and it was relatively easy to come up with a mapping of Voynichese 'clusters' into vowels and consonants.

This had a few very interesting results:
1) It became possible to map Italian to realistic-looking Voynichese and back
2) The same transformation allowed (with a few minor tweaks) to turn real Voynichese into a string of characters with a reasonable vowel-consonant alternation
3) The transformation in step 2 converted the bigram entropy of Voynichese to a completely normal value.

As expected, however, this text was a completely meaningless string.

The selected mapping beween Voynichese and Italian was rather arbitrary, and I am still interested in improving on this.

(18-03-2024, 12:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I 'cheated' in a sense, in that I used a plain text in Italian rather than Latin. Italian has a much cleaner split of the text in vowels and consonants, and it was relatively easy to come up with a mapping of Voynichese 'clusters' into vowels and consonants.

I've been studying Latin syllabification. It's not easy to find definitive rules, especially for medieval Latin, not classical nor modern ecclesiastic Latin, and there are many contradictory statements in books. From my experiments running an imperfect script (it does not take into account etymology and I'm not sure about a few cases) there seems to be only about a thousand different syllables in long medieval Latin texts.

There is an interesting passage about the properties of Latin syllables in Leon Battista Alberti, De componendis cyfris, written ca. 1466. If he had a specific idea in mind to exploit them for cryptography, he didn't tell.

(18-03-2024, 12:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I've been long attracted to the same idea, and I haven't yet given up on it.

This resulted in an informal 'paper' which is at academia.edu, and which I believe I have also summarised here.

This looks like it must be the paper you're describing:

https://www.academia.edu/51151093/NOT_the_solution_to_the_Voynich_MS

I wasn't able to find a summary here, but your posts in You are not allowed to view links. Register or Login to view. look as though they might involve similar experiments.

In your experimental example, you've broken the text into syllables that always end with a vowel. That's different from what I was suggesting, but it might be more consistent with vord morphology, particularly in light of Emma's remarks.

In a scheme like this, it seems to me that L, M, N, and R would likely have been handled as a continuation of a vowel segment -- AL, AM, AN, AR, etc. -- rather than attached to a following consonant. These (and/or the second vowels in diphthongs) might be good candidates for the slot in vord morphology which Emma used to call a "tail."

If we break up the same text you used with that modification to the rules, we get something like this, where = marks an extension of a vowel segment and + marks a clustering of consonants:

NE=L ME Z+ZO DE=L CA=M MI=N DI NO S+T+RA VI TA
MI RI T+RO VA=I PE=R=U NA SE=L VA=O S+CU RA
CHE LA DI RI T+TA VI=A=E RA S+MA=R RI TA
A HI QuA=N TO=A DI=R QuA=L=E RA=E CO SA DU RA
E S+TA SE=L VA SE=L VA G+GI=A=E=A S+P+RA=E FO=R TE
CHE NE=L PE=N SI=E=R RI=N=O VA LA PA=U R=A
TA=N TE=A MA RA CHE PO CO=E PI=U MO=R TE
MA PE=R T+RA T+TA=R DE=L BE=N CHI VI T+RO VA=I
DI RO DE LA=L T+RE CO SE CHI V+HO S+CO=R TE
I=O NO=N SO BE=N RI DI=R CO MI VI=N T+RA=I
TA=N TE RA PI=E=N DI SO=N NO=A QuE=L PU=N TO
CHE LA VE RA CE VI=A=A B+BA=N DO NA=I
MA PO=I CHI FU=I=A=L PI=E DU=N CO=L LE GI U=N TO
LA DO VE TE=R MI NA VA QuE=L LA VA=L LE
CHE MA VE=A DI PA=U RA I=L CO=R CO=M PU=N TO

The distribution of doubled consonants T+T, Z+Z, G+G, B+B, mirrors that of initial benched gallows in Voynichese -- found only in mid-line, never line-start.

There are a number of cases in which we'd have multiple options for encoding the same plaintext string, e.g.,

MA RA -- R as onset consonant
MA=R A -- R attached to preceding vowel
MA=R=A -- R attached to preceding vowel, and the following A treated as a further continuation of the vowel segment

Whether these options would be enough to make identical repetitions of longer sequences of vords unlikely, I'm not sure.

The only place we find stray single vowels in this example is at the beginning of lines. There might be an impulse to connect these to the syllables that next follow, giving us some distinctively structured line-start words:

A=HI QuA=N TO=A DI=R QuA=L=E RA=E CO SA DU RA
E=S+TA SE=L VA SE=L VA G+GI=A=E=A S+P+RA=E FO=R TE

Within a line, vowel segments might sometimes grow awkwardly long.

CHE LA VE RA CE VI=A=A B+BA=N DO NA=I
MA PO=I CHI FU=I=A=L PI=E DU=N CO=L LE GI U=N TO

In these cases, there might be an impulse to break them up and connect the later pieces to the following syllable:

CHE LA VE RA CE VI=A A=B+BA=N DO NA=I
MA PO=I CHI FU=I A=L=PI=E DU=N CO=L LE GI U=N TO

Or the parts might end up s p r e a d o u t.

[attachment=8288]

Just a few other ways I could see an arrangement like this fitting some of the peculiarities of Voynichese, in addition to the vord structure / entropy issue.

(18-03-2024, 11:29 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.This looks like it must be the paper you're describing:

https://www.academia.edu/51151093/NOT_the_solution_to_the_Voynich_MS

I wasn't able to find a summary here, but your posts in You are not allowed to view links. Register or Login to view. look as though they might involve similar experiments.

In your experimental example, you've broken the text into syllables that always end with a vowel. That's different from what I was suggesting, but it might be more consistent with vord morphology, particularly in light of Emma's remarks.

Yes, indeed.
Also, I was not so much interested in definitely identifying syllables, but more: manageable units.

An underlying observation was that a majority of Voynich words end with:
- y
- [i]in sequence
- l
- r
which could be relatively easily mapped to vowels.

pfeaster

Emma May Smith

pfeaster

cvetkakocj@rogers.com

ReneZ

nablator

pfeaster

ReneZ