Anton > 27-06-2020, 11:54 PM
Anton > 28-06-2020, 01:43 AM
Alin_J > 28-06-2020, 07:21 AM
-JKP- > 28-06-2020, 07:40 AM
(28-06-2020, 07:21 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.
Koen G > 28-06-2020, 09:55 AM
-JKP- > 28-06-2020, 10:16 AM
MarcoP > 28-06-2020, 10:18 AM
(28-06-2020, 07:21 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Interesting. The idea that words might be mapped and encoded into some kind of numerical form representation in the VM has also come up before, haven't it?
(28-06-2020, 07:40 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.The reason I have given for the possibility of numbers is the positional characteristics of the glyphs. This is not typical of natural language, but is absolutely essential for numbers.
(27-06-2020, 11:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.What would that mean?
Stolfi Wrote:...the East Asian monosyllabic languages [Chinese, Vietnamese, and Tibetan] *do* have symmetric, binomial-like word length distributions, just like Roman numerals...
Why binomial?
Why should those languages have a binomial-like syllable-length distribution? Well, as observed in the previous page, if you add many random variables with arbitrary distributions, you get a random variable with a binomial-like, bell-shaped distribution, which approaches a Gaussian as you add more and more terms. (Technically, the histogram of the sum of two independent variables is the convolution of their histograms; and the convolution of N arbitrary histograms, as N increases, generally becomes more and more like a Gaussian distribution.)
Now, unlike a polysyllabic word, a single syllable has only a fixed number N of phonetic "slots" (attributes), corresponding to separate muscular controls; and each slot can have a finite number of possible values. In the Chinese syllable, for instance, the initial consonant is one slot, which can have some 20 values including "silent". Another slot would be the glide before the main vowel ("i", "u", or "none", as in "lian", "luan", or "lan"). The main vowel, the secondary glide, the final consonant, and the syllable tone would be the other slots.
In principle, then, a syllable could be written as a sequence of N symbols, each corresponding to one phonetic slot. However, that would be a rather inefficient encoding, because the values of each slot have highly different frequencies in common use. (In particular, the most frequently used words will tend to use slot values that can be articulated with less time or effort.)
For that reason, almost all scripts follow the model of Roman numerals, where one value for each slot is assigned as "default" and not written, while the other values are mapped to distinctive symbols. Thus the "silent" consonant and "none" glides are omitted in pin-yin; the "a" vowel is omitted in Hindu scripts; and the mid level tone of Vietnamese is not marked in Quo^'c Ngu+~. Moreover, if a slot has many possible values, some of them are often encoded by sequences of two or more symbols, such as "ch" in Chinese or "u+" in Vietnamese.
Thus, in all those scripts, the written syllable is the concatenation of N variable-length strings. Assuming that the value of a slot is to some extend independent of other slots, the word-length histogram is therefore the convolution of N slot-length histograms, and therefore is expected to resemble a binomial distribution.
-JKP- > 28-06-2020, 10:33 AM
MarcoP Wrote:Obviously positional characteristics of glyphs have little to do with languages or numbers and everything to do with the writing systems used to represent them.
MarcoP > 28-06-2020, 10:42 AM
(28-06-2020, 10:33 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.MarcoP Wrote:Obviously positional characteristics of glyphs have little to do with languages or numbers and everything to do with the writing systems used to represent them.
Alin_J > 28-06-2020, 10:45 AM
(28-06-2020, 09:55 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This is an interesting line of investigation. But I see one big problem, which you mention already yourself: it depends on the transcription.