The Voynich Ninja
Number of syllables distribution - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Number of syllables distribution (/thread-2741.html)

Pages: 1 2 3 4 5 6


Number of syllables distribution - Koen G - 14-04-2019

Quick question, has anyone ever attempted to count the number of "syllables" in Voynichese "words" and compare them to those in other languages?

For example, 10% one syllable, 30% two syllable etc.

I know we don't know for certain whether there are vowels and syllables like there would be in Latin. But various people have identified glyphs that are likely to be vowels. So if you build on that assumption, what would the result be?

(I vaguely remember that something like this has been discussed, but since we talk a lot about syllables the search function resulted in an overload).


RE: Number of syllables distribution - Emma May Smith - 14-04-2019

I've done it. I wrote a blog post about You are not allowed to view links. Register or Login to view. Even though I might tweak that a little it's still my basic thinking.

I'm happy to discuss the results and word structure in general, as I think it's something which is not discussed enough.


RE: Number of syllables distribution - Koen G - 14-04-2019

(14-04-2019, 02:58 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I've done it. I wrote a blog post about You are not allowed to view links. Register or Login to view. 

Ah, I should have known Smile
My question would have been what to do with e- and i-clusters. I wouldn't have come up with your solution for conditional treatment of "e", but it's certainly a good option.
I can follow the conditional status of [ch], but I think I would have gambled on [sh] always constituting a syllable, assuming that the diacritic indicates contraction.

Still I think your list is very illuminating. It's not often that I wake up with a deep VM-related question and an answer is presented so well Wink

I also agree that the apparent presence of a relatively small amount of 0-syllable words is nothing to be worried about. The shortage of words over three syllables long is much stranger. I came to this question while thinking how I would turn Latin into Voynichese. Confronted with words like "Maximilianus", the only option would be to either split words or drop vowels. And that's only to get the number of syllables right, because then you have to deal with limited word structure.


RE: Number of syllables distribution - Emma May Smith - 14-04-2019

Yes, the restricted length of Voynich words is rather interesting. It may simply be that the usual language of the text doesn't have very long words but also that longer words have been split or shortened. I note that longer words do exist, and these could be proper names or from a different language and therefore great targets for investigation.

I suppose if we could program something to automatically split all the words (I only did about the one thousand most common) we could quickly spot the very long and unusual. There's also the potential for automatic syllable identification and creating statistics about what syllables occur with others and in what positions. Details from my syllabification show that [i] sequences are much rarer in words with three syllables while [e] sequences seem to be roughly equally spread across words regardless of number of syllables.

(I've had in mind for some time that [i] sequences maybe relate to some kind of prosodic variation and that the differences between [y, an, ain, aiin, aiiin] might be much more subjective to the speaker than being strictly different words. This is what we tend to see them at the end of words rather than internally, and why they prefer words of a certain length. But it's a difficult idea to form and test and could be totally wrong.)

As for [ch, sh], this is where I'm most unsure. I do think that they are acting like vowels (or at least like [a, y, o]) but I know that having discussed this with Marco in the past the evidence is not conclusive. But the possibility of the length of [e] sequences showing that [ch, sh] might incorporate an [e] means that the whole "[y] deletion" hypothesis is the same for [e] and [ch, sh]. We see the same thing with [ckh, cth, cfh, cph], in that they both shorten [e] sequences and are proportionately more likely to be followed immediately by [d].


RE: Number of syllables distribution - MarcoP - 14-04-2019

(14-04-2019, 04:23 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I suppose if we could program something to automatically split all the words (I only did about the one thousand most common) we could quickly spot the very long and unusual.

Hi Emma,
do you mean finding all the long words that cannot be "explained" as the concatenation of two shorter words?


About benches and e-sequences:
A possible experiment could be considering bench+e sequences as single glyphs, and see the effect on Sukhotin's algorithm (for instance). More than 90% of e-sequences follow a gallows, bench, or benched-gallows glyph. A difficulty would be deciding what to do with benched-gallows: gallows appear to be clearly distinct; we can experiment with assimilating benches and 'e'; but benched-gallows?
The characters following e-sequences are almost equally split among possible vowels [oay] and other glyphs, so the outcome of the experiment is not easy to predict....


RE: Number of syllables distribution - Emma May Smith - 14-04-2019

(14-04-2019, 05:20 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hi Emma,
do you mean finding all the long words that cannot be "explained" as the concatenation of two shorter words?

Not necessarily. If a longer word was sufficiently nativized it would be pronounceable by a hypothetical speaker. Therefore it could be broken down into syllables and each of those syllables has a strong likelihood of having appeared elsewhere in the text. There would also be the possibility of discovering unusual syllable combinations even though those syllables are quite normal and the word isn't overlong.

Thus a word like [teolkedain] would be made up out the syllables [teo] + [lkey] + [dain]. It's actually not too long, and all the syllables are perfectly well known. But the combination of [teo] and [lkey] in a single word is unusual, as is the word ending [dain] for a three syllable word. But we might note that a word like [teolkechey] is made up out of [teo] + [lkey] + [chey], thus containing the same unusual combination of [teo] and [lkey]. We would also note that the word again unusually ends with [chey] as the third syllable, something only really seen in four syllable words.

In this way we can construct two sets of words: Voynich AB (which is all the words normally constructed in Currier A & B, however rare or common) and Voynich X, words which for some reason contradict the way Voynich AB words are constructed. There would be no reason for this Voynich X to be coherent, as it could come for many sources. But as Rene has pointed out elsewhere, if the main text is encoded in some way, there is possibly some words are unencoded as they lie outside the system for some reason. This is the weak point of the whole text, maybe.

Quote:About benches and e-sequences:
A possible experiment could be considering bench+e sequences as single glyphs, and see the effect on Sukhotin's algorithm (for instance). More than 90% of e-sequences follow a gallows, bench, or benched-gallows glyph. A difficulty would be deciding what to do with benched-gallows: gallows appear to be clearly distinct; we can experiment with assimilating benches and 'e'; but benched-gallows?
The characters following e-sequences are almost equally split among possible vowels [oay] and other glyphs, so the outcome of the experiment is not easy to predict....

I think the easiest was to do it would be to consider the [h] as part of the [e] sequences or equal to an [e]. Then [C] and [S] and [ck, ct, cf, cp] would be counted as glyphs in their own right. Maybe also insert a null after an [e] sequence or [h] not followed by [a, y, o] and that null would be a proxy which we might hope the algorithm would discover as a vowel?

(I seem to recall that one of the vowel finding experiments did actually find [h] as a vowel!)


RE: Number of syllables distribution - MarcoP - 14-04-2019

Hi Emma,
I think the VoynichX is still not clear to me. I will try to re-read the thread in the next days and see if it gets any better Smile

I guess that You are not allowed to view links. Register or Login to view. would have an overlap with VoynichX, but I don't understand enough of any of the two systems to have a clear perception of its extension.

(14-04-2019, 05:56 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
Quote:About benches and e-sequences:
A possible experiment could be considering bench+e sequences as single glyphs, and see the effect on Sukhotin's algorithm (for instance). More than 90% of e-sequences follow a gallows, bench, or benched-gallows glyph. A difficulty would be deciding what to do with benched-gallows: gallows appear to be clearly distinct; we can experiment with assimilating benches and 'e'; but benched-gallows?
The characters following e-sequences are almost equally split among possible vowels [oay] and other glyphs, so the outcome of the experiment is not easy to predict....

I think the easiest was to do it would be to consider the [h] as part of the [e] sequences or equal to an [e]. Then [C] and [S] and [ck, ct, cf, cp] would be counted as glyphs in their own right. Maybe also insert a null after an [e] sequence or [h] not followed by [a, y, o] and that null would be a proxy which we might hope the algorithm would discover as a vowel?

(I seem to recall that one of the vowel finding experiments did actually find [h] as a vowel!)

This experiment seems much more approachable, but also here a few examples could be useful.
How would chey, cheeey, pchey, chol, ckhedy, chckhedy be encoded?

Of course, experimenting with and without the 'null' should be fairly easy.


RE: Number of syllables distribution - Emma May Smith - 14-04-2019

Hello Marco

I appreciate if some of the things I've written are a little dense. I suppose Voynich X would be quite like Stolfi's Abnormal Words. Though with two differences: a) some of his words definitely would be removed from Voynich X as my syllabification has no problem with them, and 2) many more unusual syllable combinations would be captured, as both the high and low level structure would be scrutinised. I suppose in all cases, for any kind of structural test or syllabification, good results rely on both how much of the text can be shown to have a regular structure and what it reveals about those words with irregular structure.

I have to say I'm a huge fan of Stolfi, but you already know that. I wish he were still around.

Quote:This experiment seems much more approachable, but also here a few examples could be useful.
How would chey, cheeey, pchey, chol, ckhedy, chckhedy be encoded?

Of course, experimenting with and without the 'null' should be fairly easy.

I think it's best, if we're willing to use nulls, just to insert them where we think [y] is missing. We don't have to worry about [h] or [e] sequences. So only two words in your list would be altered to include [N] (null):
[ckhedy] would become [ckheNdy]
[chckhedy] would become [chNckheNdy]

To me it seems obvious that [N] would be picked out a vowel in such cases, but the test is to compare it with the version of the text without the [N]. If we can see that a) [a, y o, N] were much more strongly identified as vowels, and b) everything else was much less strongly identified as vowels, then I think it would be a success.

Basically: if we can insert [N] in a regular and predictable way does it make the result of vowel-finding algorithms clearer?

There is, of course, the problem of whether we count [ch, sh, ckh, cth, cfh, cph, e, ee, eee] as one glyph or multiple, but that's true regardless.


RE: Number of syllables distribution - Koen G - 15-04-2019

I'm thinking there should be a way to automatically count syllables without having to do any coding (i.e. within my abilities). Basically you take a word list with each word on a line, paste it in Word or whatever, use "find and replace" to delete all "consonants" and to replace all "vowels" by 1. If you do this in the correct order, you can even implement conditional cases. Then copy-paste the resulting strings of ones back in excel (next to the original word list), make a sum of the "1"'s and that will be your number of syllables.

(probably the programming-savvy are making fun of me now, but that's how I would do it Wink)

Does anyone have a Voynichese word list?


RE: Number of syllables distribution - -JKP- - 15-04-2019

(15-04-2019, 07:30 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I'm thinking there should be a way to automatically count syllables without having to do any coding (i.e. within my abilities). Basically you take a word list with each word on a line, paste it in Word or whatever, use "find and replace" to delete all "consonants" and to replace all "vowels" by 1. If you do this in the correct order, you can even implement conditional cases. Then copy-paste the resulting strings of ones back in excel (next to the original word list), make a sum of the "1"'s and that will be your number of syllables.

(probably the programming-savvy are making fun of me now, but that's how I would do it Wink)

Does anyone have a Voynichese word list?

By "word list", do you mean one instance of each token found in the VMS?