The Voynich Ninja

Full Version: Repetition of words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
(15-03-2026, 12:22 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.Is there much difference between the Chinese theory and a syllabic European theory whether the VMS words or the glyphs represent syllables?

That would be an interesting hypothesis to explore.  (If you don't believe the Chinese theory, that is.) 

One possible problem is the number of different "syllables".  I *suspect* that Chinese languages have more distinct syllables than European languages, because of tones. But maybe not.

European languages also inflect many words by appending suffixes that are generally one syllable or half a syllable.  Think of "-ed" and "-ing" in English or "-us" and "-um" in Latin, "-re" in Italian, etc.  Maybe that feature can be detected and used to distingush from "Chinese".

All the best, --stolfi
(15-03-2026, 12:43 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I *suspect* that Chinese languages have more distinct syllables than European languages, because of tones.

All 4 or 5 tones (there is a neutral tone in Mandarin, not in Cantonese) don't exist for each syllable, so it's less than ~400*5.

You are not allowed to view links. Register or Login to view.

Quote:Yoon Mi Oh's 2015 thesis (pages 44-45) provides estimates of the number of syllables for various languages, gathered by taking the 20,000 most frequent words in a corpus of each language and counting the different syllables that show up. Ordering them by increasing number of syllables:

Japanese: 643
Korean: 1104
Mandarin: 1274
Cantonese: 1298
Basque: 2082
Thai: 2438
Italian: 2729
Spanish: 2778
French: 2949
Turkish: 3260
Catalan: 3600
Serbian: 3831
Finnish: 3844
Hungarian: 4325
German: 5100
Vietnamese: 5156
English: 6949
You are not allowed to view links. Register or Login to view.
Yeah, I’d expect that Chinese would have 5 times the number of syllables because of the four tones plus neutral that can change the entire meaning of a monosyllabic word like ma.
Thanks for the numbers! You write:

(15-03-2026, 01:30 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.All 4 or 5 tones (there is a neutral tone in Mandarin, not in Cantonese) don't exist for each syllable, so it's less than ~400*5.

IIUC Mandarin's "neutral" tone is not a fifth tone.  Rather, syllables with a neutral tone will have their tone determined by the context.  Thus in a precise phonetic rendering any such syllables would be rendered in two or more different ways, but with one of the four main tones.  

That said, it is true that not all possible letter and tone combinations have meaning in Mandarin, and only some of the meaningful combinations will show up in a 20'000 word corpus.  

But that can be said of any language.  

I found some those numbers rather surprising.  I will have to check his thesis.   

Offhand I would say that 20'000 words is a rather small corpus.  If the corpus includes only one or two texts of each language, it could easily have an unusually small vocabulary, and therefore an unusually small number of syllables. 

I recall that two translations of the Pentateuch into Chinese (or Vietnamese, not sure now) that I got had very different vocabulary sizes, apparently because one was created by an European or American missionary, the other by a native priest.

The number for Japanese seems too high.  If I counted correctly, Japanese has only ~15 consonant sounds (including the voiced/unvoiced variants) and 5 vowel sounds plus three glides.  With consonant doubling and vowel lengthening, that would give less than 500 possible syllables.  Finding 643 in that small corpus seems surprising.   Did I miss something?  Maybe he counted the final 'n' as part of the previous syllable?  

On the other hand, the numbers for Mandarin and Cantonese seem way too low.  Maybe because his corpus of "20'000 words" for those languages was actually 20'000 syllables?

The fact that he got almost exactly the same number for Mandarin and Cantonese suggests that he used the same text in ideographic cararactters but rendered phonetically with Mandarin or Cantonese readings.  The difference of 20 would then be due to homophones (different characters that have the same pronunciation).

And I would guess that his very high number for English must be the result of counting syllables of the written language, according to the traditional hyphenation rules, rather than of the spoken one with some phonetic definition of syllable.  So that 'to', 'too', 'two' would count as three distinct syllables, and 'squirrelled' would be a single distinct syllable...

All the best, --stolfi
(15-03-2026, 05:33 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The number for Japanese seems too high.  If I counted correctly, Japanese has only ~15 consonant sounds (including the voiced/unvoiced variants) and 5 vowel sounds plus three glides.  With consonant doubling and vowel lengthening, that would give less than 500 possible syllables.  Finding 643 in that small corpus seems surprising.   Did I miss something?  Maybe he counted the final 'n' as part of the previous syllable?

It's debatable:

Quote:... , there are at least several hundred possible syllables in Japanese, conservatively put at around 400. It’s easy to see how I reached the number 400. First, 100 of the mora in Japanese are allowed as syllable onsets, and these can easily be extended by the addition of a long vowel. So きゅ becomes きゅう, に becomes にい, etc. This alone brings us up to 200 possible syllables. Next, each of these 200 syllables can further be used to create another syllable by the addition of the mora ん. So きゅう would become きゅうん、にい would become にいん, etc. This doubles our list of possible syllables from 200 to 400. I cap my count here at 400, because it is debatable whether other sequences like わいん where the vowels in the middle are not identical constitutes one syllable or two (わ and いん).
You are not allowed to view links. Register or Login to view.
The Mandarin neutral tone is a toneless utterance. The ‘word’ ma has five meanings dependent on tone. Mother, hemp, horse, curse, question-mark are the five meanings.

Ni3 hao3 means hello but Ni3 hao3 ma means How are you?


And I believe all words in Japanese ‘could’ be written with the 46 syllables if katakana.
[quote="nablator" pid='81447' dateline='1773594785']
Quote:First, 100 of the mora in Japanese are allowed as syllable onsets, and these can easily be extended by the addition of a long vowel. So きゅ becomes きゅう, に becomes にい, etc. This alone brings us up to 200 possible syllables. Next, each of these 200 syllables can further be used to create another syllable by the addition of the mora ん.

OK, so he did include the ん in the previous syllable. (The Japanese are used to think of it as a separate syllable, not just because of the writing system but also for metric purposes in songs. The latter can stretch the んs like European songs will stretch vowels.)  

Then with consonant doubling we would get 800 potential syllables.

All the best, --stolfi
(15-03-2026, 06:20 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.The Mandarin neutral tone is a toneless utterance. The ‘word’ ma has five meanings dependent on tone. Mother, hemp, horse, curse, question-mark are the five meanings.

Ni3 hao3 means hello but Ni3 hao3 ma means How are you?

Inded, my guess was quite wrong: it seems that the Madarin language uses only ~1300 distinct syllables, which are split roughly evenly among the four main tones (from ~250 of tone 2 to ~350 of tone 3).  IIUC, the neutral tone syllables with distinct meaning add a couple hundred to that total.

So the ~1270 count in that sample actually is surprisingly high. 

Quote:And I believe all words in Japanese ‘could’ be written with the 46 syllables if katakana.

Or hiragana. Yes, but the basic syllables can be modified by a 'voicing" diacritc that turns e.g. "ta" into "da", and a specific diacritic that makes "pa" out of "ha"; and by replacing the vowel to make e.g. "fe" from "fu"+small "e"; or by replacing the vowel with a glide, e.g. "ki"+small "ya" = "kya".  And then by lenghteting the vowel, e.g. "to"+"u" = "tō".  And then doubling the consonant, e.g. small "tsu" + "ka" = "kka".

All the best, --stolfi
The neutral tone is barely used in Mandarin, words like 'ma' and 'le'. Alternatively when an inherent tone is not uttered.
It is common in Thai and there are other common SE Asian languages with 5 or 6 tones.
Furthermore, in Thai, vowels can be short or long, leading to change of meaning.

Example: short khao (falling tone) is the verb to enter. long khao (falling tone) is the noun rice.
Short syllables almost exclusively have the high or low tone, so this leads to seven combinations, but also here not all seven will exist for each syllable. In fact this would be rare.

I do not know how all this is in Cantonese or Vietnamese.
All this just to say that the math is not so straightforward.
Pages: 1 2 3 4 5 6 7