In another thread a discussion was started about the phonotactics of Voynichese. Basically, where do characters occur within words and syllables and how might this reflect an underlying language. There was particular reference to that fact that [k, t] seldom occur at the end of words (the same can be said for [ch, sh]) and whether that might be a realistic linguistic feature. I've reposted my response below, and am keen to discus the matter broadly and fully.
Ok, let's get a little technical.
You can think of a syllable as having three parts:
1) onset: this is everything which comes before a vowel;
2) nucleus: this is the vowel or any sound which acts as a vowel;
3) coda: this is everything which comes after a vowel.
Now, in terms of consonants, these will typically appear in the onset or coda (though they can form the nucleus). Most languages have both a) restrictions on how many consonants can be in an onset or coda, and b) the order in which they appear. Some languages completely forbid clusters—that is, more than one consonant in either onset or coda position—but if they allow more than one they have a tendency to order them in the same way. Basically, certain sounds must be nearer the vowel than others. It's based on a quality known as sonority, but we shan't bother with explaining that except to acknowledge it exists.
There is a tendency to allow onsets to be more complex than codas. Typically all or most consonants can appear there and some clusters are allowed. Codas are more often either empty or have one of a restricted set of consonants, with clusters forbidden. Of course, many languages which flout these rules do exist, for example English, which allows clusters of three consonants in the onset and four in the coda, but it is not typical. (Indeed, Indo-European languages as a whole are typically more complex in their syllable structure than the average language.)
So, when we look at the structure of Voynich words and see that [r, l] are very commonly found at the end but [k, t] are not, what we are observing (ASSUMING the surface patterns are linguistic) is a fact about what the underlying language permits in syllable codas. We can explain this by saying that [r, l] must have some phonological difference to [k, t]. Were you to suggest that [r, l] were nasals and [k, t] plosives, then you would have a similar situation to a number of languages which forbid obstruents but permit sonorants in codas.
Likewise, the observation that [k, t] often appear at or near the beginning of words can be explained in a similar way. If you believe that [k, t] are plosives, then they have low sonority and typically always appear at the beginning of an onset before sounds with a higher sonority (sibilants are sometimes exceptional, so /s/ can appear in places like in English 'skip' and 'stone'). Those strings which appear before [k, t] in Voynich words: [o, qo, cho, che, cheo, etc] can be explained as separate syllables. The task of researchers is then not one of explaining how [k, t] work, but why the syllables within a word are structured as they are.
Sorry if this answer is a bit long-winded, but hopefully it is helpful to thinking about the possible linguistic features of Voynich words. It is my belief that a linguistic analysis, ignoring the origins of the script, the illustrations, and even the potential meaning of the text, could well solve the Voynich manuscript. At the very least it provides us with a framework for assessing both the text and potential solutions.
Oops, I made a similar thread while you were making this one - let's see if I can remove it
Edit: okay, it's gone.
One thing I personally believe is lacking in many proposed solutions is the fact that the distribution of gallows is ignored by assigning to them a sound that is common in the end of syllables. Gallows are relatively rare at the end of longer Voynichese words, and the total of words ending in T or K is just a little over a hundred.
I am sure that similar observations can be made about other glyphs, such as 4 and 9. Assigning to these, and gallows, a one-to-one value puts some serious constraints on the way syllables can be constructed in the proposed language, which is something that is almost never taken into account.
Thank you, Emma!
I only partially followed your argument, since I am not much into phonetics.
It seems to me that there are at least two different possible explanations for certain characters being rare at the end of words:
- the corresponding sounds are not allowed at the end of syllables (phonetic explanation)
- the corresponding sounds are not part of common suffixes (morphological/lexical explanation)
In the other thread, I made the example of n and m in Latin. They have similar phonetic behavior, but very different word ending statistics (because noun and verb suffixes often end with -m and very rarely with -n).
How can we understand which would be the better explanation for what we observe in Voynichese?
Marco,
I sympathise. I dumped linguistics after 9 months in my first year of higher study.
But for exactly that reason, it's great to have two accomplished linguists discussing these topics in a way that even non-linguists can understand, even if I can't add much.
- hope the compliment to Emma and Koen doesn't rate as O.T. -
(22-09-2016, 02:43 PM)M arcoP Wrote: You are not allowed to view links. Register or Login to view.It seems to me that there are at least two different possible explanations for certain characters being rare at the end of words:
- the corresponding sounds are not allowed at the end of syllables (phonetic explanation)
- the corresponding sounds are not part of common suffixes (morphological/lexical explanation)
In the other thread, I made the example of n and m in Latin. They have similar phonetic behavior, but very different word ending statistics (because noun and verb suffixes often end with -m and very rarely with -n).
How can we understand which would be the better explanation for what we observe in Voynichese?
You are right, Marco. Any observation we make could have more than one explanation. The answer is to make more observations! An analysis would need to gather as many observations as possible and build a theory which explains them all without contradicting itself.
A useful distinction can be made between frequency and validity. We can suppose that anything which happens frequently is valid, but that not all valid things are frequent and not all infrequent things are invalid. So some observable features may not happen often but they are well within the normal rules of the language. We can accept that Latin words do not often end with /n/, but some which do, such as 'in', 'non', 'nomen', and 'semen', are all core words.
My initial argument for the point in question would be that all gallows characters are somehow related and none appear often at the end of words. There are no clear examples of this
kind of character validly appearing in that position: in all cases the numbers are low and so are the rates (I think
f is the highest at 3.5% final). Also, there are ways of syllabifying Voynich words which puts gallows into the onset rather than the coda of syllables, even if the gallows is in a middle position. This would suggest that there is a general rule forbidding certain sounds at the end of syllables. The exceptions may stem from reading/writing errors and loanwords, though it is possible that some rare but valid words might exist.
(22-09-2016, 02:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.(Indeed, Indo-European languages as a whole are typically more complex in their syllable structure than the average language.)
I think that this is a very important piece of information. One or two years ago, I was reading a discussion of Mandarin vs. Czech syllable structure and as expected the difference was vast.
The Czech syllable structure was quoted as:
© © © © (V/VV/C) © © ©
where any of the ©-only groups could be empty.
Similar structures can easily be found online for other (modern) Indo-European languages.
This is not sufficient information for the Voynich problem of course, since it says nothing about 'which' consonants
are more or less likely to appear in which position.
It will be hard to match any of these structures to the Voynich word structure, but this is a purely qualitative statement,
and it should be of great interest to see someone going into this more deeply.
Edit: don't know how to make the © appear as ( C )
The reason why I somewhat insisted on this in the other thread is exactly this. One-to-one substisubstitution solutions just beg to be checked against what Voynichese syllable structure allows - or at least very strongly prefers. Apparent syllable structure is one of the few footholds we have, but it is often ignored. It also allows us to poke huge holes through most proposed solutions, but the matter often remains so vague that it is again ignored. It would indeed be nice to compose an overview of these preferences, which could be used as an immediate test for solutions, and might provide some new insights.
Emma? :-)
(23-09-2016, 07:55 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.The reason why I somewhat insisted on this in the other thread is exactly this. One-to-one substisubstitution solutions just beg to be checked against what Voynichese syllable structure allows - or at least very strongly prefers. Apparent syllable structure is one of the few footholds we have, but it is often ignored. It also allows us to poke huge holes through most proposed solutions, but the matter often remains so vague that it is again ignored. It would indeed be nice to compose an overview of these preferences, which could be used as an immediate test for solutions, and might provide some new insights.
I'm sure you've read my website thoroughly, haven't you?
Seriously, though, while I'm willing to talk about how I think the structure of Voynich words work, I'm wary that this is my interpretation and not a proven fact. A lot depends on a core assumption (in addition to assuming that the surface patterns are linguistics) and a chain of argument.
The main assumption is that the script is more-or-less alphabetic. This means that one character represents a single sound, or a narrow range of similar sounds which the reader can understand dependent on context. The size of the script is suggestive of an alphabet because, even differentiating as many characters as possible, we can only count fewer than 30 characters. This is too few for any reasonable proposal based on a different kind of script. Of course, what counts as a single character is contestable. The status of
e and
i sequences is, even now, unsettled (I would suggest that strings such as
eee and
iin are single characters, but it is unproven).
There is also the possibility of digraphs, so that a single sound is represented by two characters. I think
lk is a possible digraph (again, unproven) but others could exist. These problems stack up, so that a string such as
lkcheey could be read as having as many as 6 sounds or as few as 4. I would suggest the lower number is better but the upper number is feasible.
These questions aside, how we work out syllable structure will depend on what characters we assign to be vowels (or capable of forming a syllable nucleus). Once we make the first argument for those sound values both the structure of the syllable, and possible sounds for other characters, soon follow. I would argue that
o,
a, and
y are vowel sounds, but am interested in hearing counter proposal (and arguments).
Emma
I've read a couple of your posts but only after I had made up my mind about what could ne known about the language. I was especially fond of the one comparing a and 9, because I personally believe they are the same sound written differently depending on posotion. In fact Diane recommended your blog when I was talking about these things.
I studied linguistics at university but never learned much about comparative linguistics, which is unfortunately the most relevant discipline at this stage of voynich studies. So I'm glad you're back on the forum
About vowels, I agree with the ones yoy name. However, I don't think Voynichese is a one-to-one alphabet. For example I think the glyph that looks like c can be a vowel but also an unrelated consonant.
(24-09-2016, 06:21 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Koen Gh.Emma
I've read a couple of your posts but only after I had made up my mind about what could ne known about the language. I was especially fond of the one comparing a and 9, because I personally believe they are the same sound written differently depending on posotion. In fact Diane recommended your blog when I was talking about these things.
I studied linguistics at university but never learned much about comparative linguistics, which is unfortunately the most relevant discipline at this stage of voynich studies. So I'm glad you're back on the forum 
In the VMS, the "a" glyphs are positioned somewhat where you would expect them to be if the "a" and "o" are vowels and the letters around them that look more like consonants are actually consonants.
The EVA-y (the number 9), does not behave this way, however. It behaves
exactly as you would expect it to behave if it were the Latin 9-abbreviation glyph. In Latin (and other languages that use Latin abbreviations)—it is found frequently at the ends of words, sometimes at the beginning of words, and occasionally (not often) in the middle of words. In other words, it doesn't behave like a typical vowel and is found at the ends of words much too frequently for most natural languages.
The VMS "alphabet" (if characters are interpreted as approximately one glyph to one letter) is rather lean. Many pages use only about 15 or 16 glyphs. Some glyphs are almost never used. It's difficult to find languages with such a restrained alphabet. Most have at least 20 characters, many have 30 or more characters. Even old italic had 17 characters. If both the "a" and the "9" represent the same letter, it reduces the number of characters even further.
The alphabets that tend to have very constrained alphabets tend to be abjads but if this were an abjad, then one has to consider the possibility that "a" and "o" (and the other glyphs) do not represent vowels. Or that only a few vowels are represented.
The other problem is that the letters don't move around enough. If the spaces are accepted as real, this is a big problem. In most natural languages, the position of letters is much more flexible than it is in the VMS. If the spaces are not real, it partly solves this problem, but then one has to consider the possibility of null characters or, once again, the positional behavior of the glyphs is not typical for natural languages.
Koen wrote: "About vowels, I agree with the ones yoy name. However, I don't think Voynichese is a one-to-one alphabet. For example I think the glyph that looks like c can be a vowel but also an unrelated consonant."
If the "a" and "o" are vowels, then the glyph that resembles a "c" also behaves mostly like a vowel (in terms of position) and it's hard to cast it as a consonant, as well, based on position and frequency. I'm not saying glyphs can't do "double duty" and stand for more than one thing, but I don't think it's the "c" glyph that is the most likely contender, I think it's the "o". The "o" not only is unusually frequent, but positionally it sometimes behaves like a vowel and sometimes like something else (possibly a marker or grammatical modifier).