After a few failed attempts yesterday, today I managed to replicate Rene's experiments. I could then also run the same code on reversed files. These are the results (top 3 lines of each table comparable with Rene's tables):
The results for reversed files are considerably different in all cases. For all three files, the last character is less uncertain than the first one (basically, suffixes are more constrained than prefixes). In particular, the reversed results for Italian show that the last character has a much lower entropy than the first one: I guess this is due to many Italian words ending in one of 'a','e','i','o'.
The very low value for the second-last CUVA character (1.7) is likely due to 'D' which almost invariably appears when the last character is 'Y' (this is Currier B, with the well-known abundance of words ending by EVA:dy). The third character increases again because -DY can be preceded by several options: EDY, UDY (EVA:eedy), ODY are all among the top 10 most frequent suffixes.
(03-06-2022, 09:44 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Why does the entropy per character decrease in Latin from the first to the second and so on? Do I understand it correctly that the first character could be any character (low predictability), but then the second character must be able to combine with the first, so its predictability is higher?
This is my understanding of these figures as well.