The Voynich Ninja
Number of syllables distribution - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Number of syllables distribution (/thread-2741.html)

Pages: 1 2 3 4 5 6


RE: Number of syllables distribution - Emma May Smith - 15-04-2019

Let me share the actual spreadsheet I did some years ago. This should give you some idea of how I broke down the most common words. I might change one or two now, but it shows the way I'm thinking.


.ods   Full Syllabification.ods (Size: 57.81 KB / Downloads: 43)


RE: Number of syllables distribution - Koen G - 16-04-2019

So I took a random chunk of Latin (Plinius, 9000 words) and this is how it compares to the complete VM in terms of syllables (with benches, a, y, o forming syllables and e conditionally). 

   

(Note that the top percentage of VM values is likely to include dubitable transcription choices).

It's clear that the VM is skewed towards the lower syllable counts, including more 2-syllable words and fewer over three. But still the difference is less pronounced than I expected.

Of course this would be different if benches were processed conditionally like Emma does. Quite a number of 4-syllable words would shift to 3.


RE: Number of syllables distribution - Koen G - 16-04-2019

I must add that there were also 0-syllable words in the Latin text: Roman numerals  Smile


RE: Number of syllables distribution - -JKP- - 16-04-2019

That's an interesting wrinkle.

If aiin is considered to be linguistic, it gets one kind of syllable count.

If aiin is considered to be numbers (e.g., Roman numerals), you end up with quite a different syllable count.


RE: Number of syllables distribution - ReneZ - 16-04-2019

This is one of those statistics where one knows that there will be errors, but they remain interesting due to the fact that they are based on a large amount of data.

The bigger problem is with the word boundaries. If one hasn't done any transcription oneself, one will not appreciate how difficult it is to judge word spaces.


By looking at transcription files and the source images, one sees that the decision if quite often made on the basis that one 'expects' certain combinations to be words.
This is highly dangerous, and can very well bias the statistics, also for a long text.

Just to give an example, where would one put the spaces in the last line of fol. 104r:
You are not allowed to view links. Register or Login to view.


RE: Number of syllables distribution - -JKP- - 16-04-2019

Quote:ReneZ: Just to give an example, where would one put the spaces in the last line of fol. 104r:

In that line, EVA-l and EVA-s are self-standing. The rest are as they appear.

I'm reasonably confident about that.


I completely agree that some spaces are difficult to judge, but this line doesn't seem as difficult as some.


Ifitw erea strai gh tsubs titu ti on cip herthes pac eswo uldno tmatt erunfo rtu nat ely orpe rhaps fortun atel y fort hose wh olov evms i ts no tastrai ght subs titu ti onci pher.


RE: Number of syllables distribution - Emma May Smith - 16-04-2019

(16-04-2019, 12:18 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Of course this would be different if benches were processed conditionally like Emma does. Quite a number of 4-syllable words would shift to 3.

I also only did about two thirds of the text. Less common words are likely to be longer and so could shift the toward longer syllables.


RE: Number of syllables distribution - ReneZ - 16-04-2019

(16-04-2019, 03:44 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.In that line, EVA-l and EVA-s are self-standing. The rest are as they appear.

I'm reasonably confident about that.


I completely agree that some spaces are difficult to judge, but this line doesn't seem as difficult as some.

I didn't look very hard. The Eva-l for certain, the Eva-s: very likely.

However, how about the otaiin following that?

And more interestingly, what about the second 'word'?

The characters of olcheeo are not at the same height, in particular the first two. This would not be expected to happen if this were one word. But we are necessarily guessing.

As a general 'problem', all of the last 10 lines show vertical shifts all over the place. It is as if these lines were not written line by line, but in blocks. This happens in many places, especially in quire 20.

Another interesting example is line 18 from the bottom. The one that has aror Sheey .
Towards the end, one has to really guess where are the 'word' boundaries.


RE: Number of syllables distribution - -JKP- - 16-04-2019

(16-04-2019, 07:41 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(16-04-2019, 03:44 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.In that line, EVA-l and EVA-s are self-standing. The rest are as they appear.

I'm reasonably confident about that.


I completely agree that some spaces are difficult to judge, but this line doesn't seem as difficult as some.

I didn't look very hard. The Eva-l for certain, the Eva-s: very likely.

However, how about the otaiin following that?

Ah, are you referring to the gap between the "o" and EVA-t? Yes, this happens quite frequently, where there appears to be a half-space. I gave up trying to figure out whether instances like this are one unit or two and created a version of the transcript that records half-spaces.


Quote:And more interestingly, what about the second 'word'?

The characters of olcheeo are not at the same height, in particular the first two. This would not be expected to happen if this were one word. But we are necessarily guessing.

I agree that there's something different about this one. I looked at it numerous times.

But, there might be a couple of explanations for it. The vellum is unlined and unpricked, so if the scribe were copying from something and had to look away frequently, this might account for the inconsistency of the baseline.

Another possible explanation, one that I've frequently wondered about, is whether it's one of those instances where the text might have been laid down in more than one pass.



Quote:As a general 'problem', all of the last 10 lines show vertical shifts all over the place. It is as if these lines were not written line by line, but in blocks. This happens in many places, especially in quire 20.

Another interesting example is line 18 from the bottom. The one that has aror Sheey .
Towards the end, one has to really guess where are the 'word' boundaries.

I'll agree that one is tough. I don't like to assume that every glyph-sequence is predictable, but here I am inclined to interpret it based on patterns that are extremely prevalent throughout the manuscript (in other words, I'm reluctant to believe that the gap between o and l in cheeol is a full space, but will always keep in the back of my mind that it might be...), so...

to me it looks like .... olcheear chedar or aror sheey olkeechy or char cheeol s or or aiin otam (with slight uncertainty about the "o" in otam, since it is "a"-like).


RE: Number of syllables distribution - Koen G - 17-04-2019

After some trial and error I managed to compile a type list of the Latin text I used (first ~9000 words of Pliny on Astronomy You are not allowed to view links. Register or Login to view.)

The following graph compares the syllable counts in Latin and VM tokens and types as a percentage of their respective totals.

   

Focus on the Latin for a second, the blue bars. You can see that there's a very steep drop from tokens to types in the one syllable range. This is logical: words like "ad" are used often, resulting in many tokens for few types. For two syllables there is a smaller decrease. For three syllables there is a slight increase in the relative prominence of types: there are many different three-syllable words that aren't used that often. This effect is much greater still in four and five syllable words.

What surprised me is that the VM (yellow bars) follows the same pattern overall. Token-heavy for one and two syllables, type-heavy for more.