The Voynich Ninja
Number of syllables distribution - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Number of syllables distribution (/thread-2741.html)

Pages: 1 2 3 4 5 6


RE: Number of syllables distribution - Koen G - 18-04-2019

Now I tried the system using the suggested vowels from Guy's paper discussed by Marco here: You are not allowed to view links. Register or Login to view.

I include the graphs for types and tokens for Latin, the slightly modified Emma's vowels and Guy's vowels. 
Detecting syllables based on Guy's vowels involved adapting the alphabet first, since for example he considers i's that are not part of "in" or "iin" vowels.  As Guy notes in his paper though, the transcription of such glyph clusters is inherently problematic. These are the results, blue for Latin, red for Emma and yellow for Guy.

   

We might note that in Guy's system, there would be almost 30% one-syllable tokens, which is a lot. 
Also, in almost all cases, Emma's vowels (plus all benches) result in counts closer to Latin than Guy's.


RE: Number of syllables distribution - Emma May Smith - 18-04-2019

Very interesting. Thank you for doing this Koen, and including my system.

I find the difference between 1 and 2 syllable types very striking as the other two systems don't look anything like that.


RE: Number of syllables distribution - Koen G - 19-04-2019

Emma, I now tried to "automatically" apply your complete method of vowel selecting to the entire word list, as you described:

Quote:[a, y, o] are vowels and every instance of those indicates a syllable; [e] sequences are vowels if not immediately followed by [a, y, o]; and [ch, sh] count as vowels if not immediately followed by an [e] sequence or [a, y, o];

The method isn't flawless (but neither is manual parsing), but this is how I did it, all with find-and-replace:

1) All [a, y, o] become 1.
2) The sequence [e1] now means there was an [e] followed by [a, y, o]. Since you don't want the [e] to be a vowel in this case, I replaced [eeee1], [eee1] etc. by 1, effectively eliminating the [e]-sequences from vowel counts only when followed by [a, y, o].
3) Remaining [e]-sequences are replaced by 1. I wasn't sure whether you take e-sequences to be single vowels or rather vowel sequences. I took each sequence to be one vowel, not each [e].
4) Benches followed by 1 are eliminated by replacing the whole sequence with 1.
5) Remaining benches are replaced by 1.
6) Everything else is eliminated.
7) Paste in excel and use COUNTIF function.
8) Share on forum in overly complicated way.


Additionally, I made a calculation of what I think could be "maximal" vowel usage: [a, y, o] are vowels, anything that is benched or a bench is a vowel, any "e"-sequence is one vowel.

   

So in the image above, the blue is Latin for comparison. Yellow is my "max vowel" calculation. Red is my attempt to automatically count syllables following Emma's rules.

What stands out is that Emma's system results in a relatively large amount of one- and two-syllable types, with a huge drop after three syllables. This is in line with what Emma describes in her post. 

Removing the conditional status of benches and e-clusters tends to bring the results much closer to Latin. Of course, we don't know if this is desirable or not, it's just an observation.


RE: Number of syllables distribution - Emma May Smith - 19-04-2019

Hi Koen,

I don't know if what you did was right or not, it seems too complicated for me to follow. I would have simply said to count a syllable in a word for every instance of:

1) [a, y, o]
2) every [e] not followed by [e, a, y, o]
3) every [h] not followed by [h, e, a, y, o]

This process would also include bench gallows, which I guess we should if we're including benches.


RE: Number of syllables distribution - Koen G - 19-04-2019

Yeah, what I did was a more complicated but faster way to reach the same results.

What surprised me is that the more inclusive count (always counting benches and e-clusters) comes close enough to Latin. But at the same time, we know it's completely different from Latin because of positional restrictions, things like word initial o and complementary distribution of a and y. 

So it kind of works for vowels as a category but not for individual vowels.


RE: Number of syllables distribution - geoffreycaveney - 20-04-2019

Koen et al., thank you for all of the statistical analysis and discussion presented in this thread. It is very interesting.

As an idea for a shortcut to calculate proportions of word types by number of syllables for a broader variety of natural languages, I note that Wiktionary presents a convenient numerical breakdown of You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view., etc.

Geoffrey


RE: Number of syllables distribution - Koen G - 20-04-2019

I'm now taking a look at Middle Dutch, as another point of reference. I took 4000 words from Van den vos Reynaerde

What surprised me was the shortage of words over 4 syllables. The only real 5-syllable word I got was verbolghenlike, "angrily". This is a three syllable root word with a common two-syllable affix. Another is Babylonien (the "ie" is two syllables in this case), which is essentially a foreign word.

Moving to four-syllable words, again these are only about 1% of all tokens, which is a stunningly low amount. Almost all of them are composite, or with a common affix.

The graph below shows the 1-5+ syllable distribution for types in Latin, my VM count, Emma's VM count and Middle Dutch. I present them in this order for a reason, as you will see:

   

My count often sticks close to Latin. I always counted benches and eee-clusters as syllable-forming, so there are fewer one-syllable types. 
Emma's original system on the other hand, is closer to Middle Dutch than to Latin: a large proportion of two-syllable types, with fewer three and over.

If you take the Latin and Dutch (blue and green) as extremes, then you see than our Voynichese counts tend to stay in between. 
It's also worth pointing out that Jacques Guy's results (which I did not include for readability of the graph) sit right in between Emma's and mine, so also between Latin and Dutch.

So at this point I would tentatively suggest a conclusion: if we rely on vowels pointed out by previous research, then Voynichese does not display abnormal syllable distribution.


RE: Number of syllables distribution - DONJCH - 21-04-2019

Congratulations! We have a result, well reasoned and clearly stated!

How long will this stand unchallenged? 5 4 3 2 1.... Big Grin


RE: Number of syllables distribution - ReneZ - 21-04-2019

Well, given that for two representative known plain texts, one in Latin and one in middle Dutch, the syllable count is quite different, there is a rather wide range of syllable counts that could be considered 'normal'.

We don't know what are the vowels in Voynichese, but it is clear that we can choose them such that the resulting syllable count is within this normal range. In fact, different definitions lead to quite different distributions.

From what I read, Emma did not choose her vowels/consonants in order to achieve a 'normal' syllable count, but the process in this thread seems to have involved some tuning, thereby ending up close to Latin.


RE: Number of syllables distribution - Koen G - 21-04-2019

For me it's not (yet) about which language it resembles most. I wasn't hunting for a match with Latin, in a way I rather de-tuned Emma's system, merely removing the conditional factors. That this ends up looking like Latin might indeed be a result of the way vowels were detected. But I don't think so, since the vowels were detected using general linguistic properties.

But remember Schmeh's interview with the linguist who thought Voynichese is not linguistic because there are no short words. Myself, I was rather worried that there would nt be enough long words.

But what is coming into focus here is that based on possible syllables, word length is of no concern even within European languages. Of course that does leave plenty of other concerns...