The Voynich Ninja
Syllabification - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Syllabification (/thread-201.html)

Pages: 1 2 3 4 5 6 7


Syllabification - Emma May Smith - 09-02-2016

I'm currently looking at working to break up the Voynich words into syllables (yes, I believe in a linguistic solution) and I wonder what methods have been tried previously, and by whom, to achieve this.

I only know of Stolfi's work, where he considers each word a syllables in itself, but what others are out there?


RE: Syllabification - ReneZ - 09-02-2016

You may want to look at Stolfi's "fine structure":

You are not allowed to view links. Register or Login to view.

or the even earlier work of Robert Firth:

You are not allowed to view links. Register or Login to view.

Neither of them call the parts  syllables, but this may be going in your direction.


RE: Syllabification - Torsten - 09-02-2016

Hi Emma,

 to parse Voynich words is an interesting topic. Regardless what you believe or not believe.

I would parse the following EVA-sequences as ligatures.
{
 ol or om og
 al ar am ag
 
 ee  ch sh
 eee 
 
 eke  ete
 ckh  cth  cph  cfh
 ikh  ith  iph  ifh

 n     r     s    l     m 
 in    ir    is    il    im
 iin   iir    iis   iil   iim
 iiin  iiir          iiil  iiim

 qo

 dy
}

This way I would parse 'daiin' as d/a/iin, 'chedy' as ch/e/dy, 'cthol' as cth/ol and 'qokeedy' as qo/k/ee/dy.


RE: Syllabification - Sam G - 12-02-2016

It seems to me that:

- <a>, <e>, and <o> are vowels
- <i> is a modifier of the previous vowel
- <ee> and <eee> are also probably single vowels, analogous to <ai> and <aii>
- <y> is either vowel or semivowel (also <l>, maybe)
- all other characters are consonants

So, it's pretty simple to determine how many syllables a given word has - just count the number of vowels or groups of vowels separated by consonants. So <chol> is one syllable, <chedy> is two, <qokeedy> is three, etc.

This leaves a few things ambiguous - it's not clear whether <cheody> should be two syllables or three, for instance, but I suspect it's only two since <eo> seems to substitute for single <o> or <e>.

It's also not clear exactly where syllable boundaries occur, for instance if <okchy> should be o-kchy, ok-chy, or okch-y.


RE: Syllabification - ReneZ - 12-02-2016

I share some of these considerations, especially that strings of i's are to be grouped with the preceding character (almost always a). Currier's alphabet has grouped them with the following character, which probably affected many people's way of mentally parsing them.


To talk about vowels and consonants is tempting, but there is the serious problem that the most common technique to identify them, i.e. with the aid of a hidden Markov model, fails for the Voynich MS text. Reddy and Knight describe this briefly, but it was already shown by others before.

Syllables are also tricky, because they are extremely language dependent. In some languages they are clearly identifiable and follow clear rules, and in others they barely play a role. English strikes me as a particularly difficult case for parsing syllables from the written text. (Going by vowels certainly won't work...).
Some languages are happy to have syllables without vowels.
Dutch has to have vowels in each syllable, but can still have long consonant clusters  (angstschreeuw).
Some Asian languages can have only very limited combinations of consonants (i.e. without a vowel in between).


RE: Syllabification - Sam G - 12-02-2016

Well, discerning consonant from vowel is not entirely trivial, but I don't think mathematical or statistical methods are the only way (or even a very good way) to go about understanding the VMS word structure.

For one thing, it seems pretty clear that the VMS script is based on the Roman alphabet and symbols used in medieval Latin abbreviation.  So when we see that the VMS letters <a>, <e>, and <o> are nearly identical to the "a", "e", and "o" of the Roman alphabet respectively, then I think this fact alone already suggests that these letters are intended to correspond to vowels, and probably even to similar vowels as these letters represent in European languages.

This similarity might be "overruled" if there were evidence that these letters do not represent vowels, but from a purely structural point of view they seem like good candidates for vowels just from simple considerations such as that every word must contain at least one vowel, and that groups of 1-3 consonants should be separated by vowels (which might not be true of all languages, but is at least often true and works well enough here).  Of course, we need to include <y> as a vowel or semivowel in order to make this work. (There are other reasons to think of <y> as a vowel or semivowel that I won't get into now.)

Then there's the fact that we have one set of letters, containing a straight stroke, which may follow <a> but not <e>, and a corresponding set of letters containing a curved stroke which may follow <e> but not <a>, yet both sets of letters may follow <o>.  While I suppose there may be other ways that this rule could be intepreted, it certainly looks like a phonotactic rule determining which consonants may follow which vowels, and at the very least demonstrates that <a>, <e>, and <o> form a related and important class of glyphs, central to the formation of nearly all words.

Really, the fact that EVA transliteration makes the text basically "pronouncible", as would likely any other transliteration scheme that mapped <a>, <e>, <o>, and <y> to vowels and the other letters to consonants (and considered <i> as a modifier), is by itself strong evidence that its implicit assignment of consonant and vowel status is basically correct.  I doubt any other system would work nearly as well, though I don't know if it has actually been tried.

There's a few other points that I could raise here, but the general point is that VMS words are highly structured, and whoever created the script did so in such a way that the word structure is reflected in the shapes of the individual letters.  I think this kind of evidence is more important than the results of any "one size fits all" algorithms applied to the text without any consideration of the text's known properties.


RE: Syllabification - Emma May Smith - 12-02-2016

Sam, you're a man after my own heart.

I more or less agree with your thoughts on vowels. My opinion is that [o] is a vowel and that [a] and [y] together make up some kind of vowel. You can think of [o] and [a/y] as the "cardinal" vowels which [e] and [i] sequences modify. However, both [e] sequences and sometimes [ch, sh] appear in the place of vowels because of [y] deletion. Once we have those basic identifications as vowels (whatever their specific values) we can syllabify.

I have personally taken the syllable of each vowel to be 1) the vowel character, 2) everything leftward of that vowel until another vowel or beginning of the word is reached, and 3) everything rightward of the vowel if there are no more vowels in the word. I've found that the structure of such syllables is pretty regular and relatively simple.

I think further work will show that the structure of syllables within words is actually rather regular too. I don't know where it will all lead, but I hope that we will be able to identify the most meaningful syllable (or even parts of a syllable) within each word, and that the text structure will be made more obvious. So often people say that phrases "look" alike as a basis for judgement, but now we have the chance of something more solid.


RE: Syllabification - ReneZ - 12-02-2016

I'm afraid that I am of a rather different opinion. I hope that's OK  Smile

My main point is about realising what are deductions and what are assumptions.
That the MS has a text in some plain language is an assumption. One may (tentatively) justify this of course, but it remains an assumption.

That each 'word' in the MS represents a plain text word is a second assumption.

Even if both turn out to be correct.....

We still don't know if the written text represents a 'written' or 'spoken' version of a language.
Some languages (that are great candidates to be represented in the MS) don't write the majority of vowels.

Some of the statements are certainly not correct.
Not every word needs to have a vowel, even if it is spoken language but especially if it is written language.
It is not necessary for groups of 3 consonants to have a vowel somewhere in between (esp. if written language).

Having said all that.....
I don't want to say that there is anything wrong with following the line of thought above, but one has to really be aware of the many assumptions already made before starting it.


RE: Syllabification - Emma May Smith - 12-02-2016

Rene, you are right. But I hope I am aware of at least the basic assumptions I make. Nearly a year ago I wrote out my own approach to studying the Voynich manuscript, and stated my two core assumptions as, "the text is linguistic and written in the plain".

I know they might not be right, but I feel that most things can flow from them. For example, if the writer was writing a language, and intended it to be written in the plain, then the assumption that spaces delineate words is fairly safe given a 1400s European context. It might actually be wrong, but it is fine within the limits I've set myself: it's the best explanation available.

On the other hand, I've reserved my judgement in some areas. You mention the split between written and spoken languages, and do feel this could be very important. I've been particularly curious as to whether it could explain the statistics of line-end and line-beginning words.

I suppose we will only know whether our assumptions were right if and when a particular approach shows results.


RE: Syllabification - Sam G - 12-02-2016

(12-02-2016, 06:38 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I more or less agree with your thoughts on vowels. My opinion is that [o] is a vowel and that [a] and [y] together make up some kind of vowel. You can think of [o] and [a/y] as the "cardinal" vowels which [e] and [i] sequences modify. However, both [e] sequences and sometimes [ch, sh] appear in the place of vowels because of [y] deletion. Once we have those basic identifications as vowels (whatever their specific values) we can syllabify.

It has occurred to me before that <y> could be some kind of variant of <a>, <e>, or perhaps both, since it does not occur in the same environment.  Maybe it is a sort of "reduced vowel", like schwa or something similar, which <a> and <e> become in word-initial or word-final position.  I will try to read over your specific ideas on this subject (I just saw your blog yesterday) and reply to them soon.

(12-02-2016, 07:01 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I'm afraid that I am of a rather different opinion. I hope that's OK  Smile

My main point is about realising what are deductions and what are assumptions.
That the MS has a text in some plain language is an assumption. One may (tentatively) justify this of course, but it remains an assumption.

That each 'word' in the MS represents a plain text word is a second assumption.
Well, okay, but arguably any idea about anything could be called an "assumption".  At some point we consider the evidence for something strong enough that we treat it as a fact, even though we really never know anything with 100% certainty.

Basically, I see no evidence whatsoever indicating that the VMS is written in cipher, and about a million reasons to think it isn't written in cipher, so I don't have a big problem assuming that it's not written in cipher, just as I have no problem "assuming" that the VMS was not made by monkeys randomly flinging ink and paint at blank pieces of parchment, or some other such scenario.

By contrast, the VMS text clearly has many properties of unencrypted natural language text, and I don't see any compelling reason to think it can't be unencrypted natural language text.

Treating language and cipher as equally likely possibilities when the evidence clearly and overwhelmingly favors one over the other does not make any sense to me.

I also don't think it can be written in any known natural language, because VMS words have a clearly defined structure which does not seem to precisely match that of any known natural language.  It certainly does not match any "major" language of Eurasia, as these have all been tried.

Realistically that leaves unknown natural language, artificial language, or language-like gibberish as the main contenders.  I think other evidence strongly favors the first of these options, so I pursue that option, though realistically most of what I wrote above remains valid under any scenario that acknowledges that the text does in fact have at least some language-like properties.

To me, all of this is really just a matter of following the evidence where it leads, rather than ignoring the evidence or assuming that all the evidence is somehow just misleading us.