ReneZ > 01-06-2018, 01:58 PM
DonaldFisk > 01-06-2018, 06:02 PM
(01-06-2018, 01:58 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I had a look at the UDHR text in Abkhaz.Abkhaz text appears to be much more sensitive to file size than text in other languages (probably because of the size of its phoneme inventory) and the UDHR is significantly smaller than the files I've been using. When I plot it, it does look quite different from Figures 1 and 2 (and consequently the VMS plot (Figure 3)), but it also has a truncated vowel branch. I've been treating digraphs (e.g. кь and кә) as single glyphs. When I don't, it then looks more like the VMS plot.
This is written in the Cyrillic script, with some additional characters. When converted to lower case, and eliminating numbers and punctuation, I am left with 44 characters, of which one is space.
I recognise at least 5 vowels.
When I ran an HMM analysis on it, it found 16 vowels, some of which clearly are consonants. This is not a very good result.
The character entropy of this text is completely standard. I compare it with other UDHR versions written in Cyrillic:
H1 H2 (cond)
------------------------------------------
Abkhaz: 4.263 3.110
Belorus: 4.531 3.083
Bulgarian 4.221 3.155
Mongolian 4.425 3.201
Macedonian 4.162 3.080
Russian: 4.439 3.136
The only unusual thing I see is that the frequency plot of the character pairs is rather asymmetric.
The unusual aspect observed in the PCA plot cannot be found back in these statistics.
ReneZ > 01-06-2018, 09:39 PM