01-06-2018, 01:58 PM
I had a look at the UDHR text in Abkhaz.
This is written in the Cyrillic script, with some additional characters. When converted to lower case, and eliminating numbers and punctuation, I am left with 44 characters, of which one is space.
I recognise at least 5 vowels.
When I ran an HMM analysis on it, it found 16 vowels, some of which clearly are consonants. This is not a very good result.
The character entropy of this text is completely standard. I compare it with other UDHR versions written in Cyrillic:
H1 H2 (cond)
------------------------------------------
Abkhaz: 4.263 3.110
Belorus: 4.531 3.083
Bulgarian 4.221 3.155
Mongolian 4.425 3.201
Macedonian 4.162 3.080
Russian: 4.439 3.136
The only unusual thing I see is that the frequency plot of the character pairs is rather asymmetric.
The unusual aspect observed in the PCA plot cannot be found back in these statistics.
This is written in the Cyrillic script, with some additional characters. When converted to lower case, and eliminating numbers and punctuation, I am left with 44 characters, of which one is space.
I recognise at least 5 vowels.
When I ran an HMM analysis on it, it found 16 vowels, some of which clearly are consonants. This is not a very good result.
The character entropy of this text is completely standard. I compare it with other UDHR versions written in Cyrillic:
H1 H2 (cond)
------------------------------------------
Abkhaz: 4.263 3.110
Belorus: 4.531 3.083
Bulgarian 4.221 3.155
Mongolian 4.425 3.201
Macedonian 4.162 3.080
Russian: 4.439 3.136
The only unusual thing I see is that the frequency plot of the character pairs is rather asymmetric.
The unusual aspect observed in the PCA plot cannot be found back in these statistics.