The Voynich Ninja
Number of syllables distribution - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Number of syllables distribution (/thread-2741.html)

Pages: 1 2 3 4 5 6


RE: Number of syllables distribution - Emma May Smith - 21-04-2019

(21-04-2019, 06:24 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Well, given that for two representative known plain texts, one in Latin and one in middle Dutch, the syllable count is quite different, there is a rather wide range of syllable counts that could be considered 'normal'.

We don't know what are the vowels in Voynichese, but it is clear that we can choose them such that the resulting syllable count is within this normal range. In fact, different definitions lead to quite different distributions.

From what I read, Emma did not choose her vowels/consonants in order to achieve a 'normal' syllable count, but the process in this thread seems to have involved some tuning, thereby ending up close to Latin.

This is clearly a basic but worthwhile outcome: a) a fairly simple definition of "syllable" is achievable which results in b) a naturalistic syllable-per-word distribution. It doesn't prove much about linguistic theories, but it shows that there's no real problem in this regard. Nor is there a lack of "short" words.

I think there's a lot more work we could do with these potential "syllables", however, to turn them into something upon which deeper analysis can be built.


RE: Number of syllables distribution - ReneZ - 21-04-2019

Indeed, there is no lack of short words, but rather of long words, although that second point is not too severe.

I believe I already put a link to Stolfi's page on the word length distribution, but it surely does no harm to repeat it.

You are not allowed to view links. Register or Login to view.

In the very first plot, we already see that there is a slight difference between normal text and labels.


RE: Number of syllables distribution - Antonio García Jiménez - 21-04-2019

It seems that Stolfi believed that the 'words' were "numbers rather than linguistic entities", and the script a kind of nomenclator.


RE: Number of syllables distribution - Koen G - 27-04-2019

(15-04-2019, 08:01 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Made quickly, but should fit.

You are not allowed to view links. Register or Login to view.

Another request, could someone generate a list like this but with Currier A and B separated?
Edit: ideally I'd need them from a standardized sample size. Is it possible to generate a word list of say the first 10k words of each "language"?


RE: Number of syllables distribution - ReneZ - 27-04-2019

10k is quite a bit too much.

All words (word types) that occur only once in the MS are by definition part of only one language just by chance.
Similarly, all words that occur only twice may end up in the same language just by chance, at a 50% probability.
(This is simplified, based on a model that half the text is language A and the other half is language B. Therefore, the numbers are qualitative, but not far off).

For all words that occur only three times in the MS, the probability that they are all in the same language just by chance is 25%.

One may decide where to put the limit, but one clearly should look only at words that occur at least 3 or 4, perhaps even more times. That greatly reduces the number of word types that should be used for such an analysis.


RE: Number of syllables distribution - Koen G - 28-04-2019

Rene: I'm also considering the more general problem of type to token ratio (TTR). In this case, it is necessary to include also unique words. But the corpus size must be standardized. I don't know yet what the ideal size would be. I reckon there should be a sweet spot somewhere. You think rather less than 10k words?


RE: Number of syllables distribution - MarcoP - 28-04-2019

Hi Koen,
the attached files are based on the ZL transcription. The "with comma" file considers uncertain spaces between words, while "no comma" ignores them.  Only the first 10K words from each of A and B were processed. The "no comma" version appears to be closer to the results (based on Currier's transcription) that Rene discussed You are not allowed to view links. Register or Login to view. ("Clustering" paragraph).

It is possible I have made errors, so please check if the numbers appear to be make sense.


RE: Number of syllables distribution - Koen G - 28-04-2019

Perfect, thanks!


RE: Number of syllables distribution - ReneZ - 29-04-2019

My last post in this thread may be a bit confusing.

One has to distinguish between, on the one hand:
- words that appear on pages that are classified as A or B

and on the other hand:
- A language words vs. B language words.

This is because there are many words that are common to both languages. This means: words that appear  both on A language pages and on B language pages. Then there are more words that are 'typically B language' than words that are 'typically A language'.


RE: Number of syllables distribution - Emma May Smith - 29-04-2019

I wonder if there are such things as A and B syllables? Or A and B combinations of syllables?

We could really open a new area here.