Koen G > 14-09-2019, 11:41 AM
Anton > 14-09-2019, 12:38 PM
(14-09-2019, 07:23 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.For word entropy, the number of different words keeps increasing. The number of possible word pairs actually grows faster than the text length, so in a way the sampling gets worse as the text length increases.
Anton > 14-09-2019, 12:49 PM
(14-09-2019, 08:50 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Again, I believe that h2 here is conditional entropy (not second order, i.e. Bi-Word, entropy).
Quote:I hope that Rene or Anton will provide a better explanation, but I guess that the striking correlation of h1-h2 with m500 is due to the fact that it removes from word entropy the component due to unpredictable word combinations (BWE): what is left is almost identical to the variability of the lexicon (TTR). The left and right outliers in the h1-h2 graphs (mostly English texts, it seems) could then be samples with exceptionally strong or weak two-word sequences repetition.
Anton > 14-09-2019, 01:01 PM
(14-09-2019, 08:51 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.(Values for entire TT with unclear "?" vords removed):
h0 h1 h2
12.97 10.45 4.36
When vords are randomly shuffled h2 increases:
12.97 10.45 4.52
There are enough repetitions of 2-vord patterns for this and, more generally, correlations between a vord and the next. But some reordering could still take place. For example, if every even-indexed vord is moved elsewhere (different line, different page) then for each repeated 3- or 4-vord pattern a repeated 2-vord pattern would remain.
Koen G > 14-09-2019, 01:23 PM
Anton > 14-09-2019, 01:35 PM
ReneZ > 14-09-2019, 02:16 PM
RenegadeHealer > 14-09-2019, 02:22 PM
(14-09-2019, 01:01 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Another point of interest with h2 is that whether it could provide hints as to inflections. My idea is that different degrees of inflection would influence the word h2 (of course also h0 and h1) values. For example, in Russian you have six cases for nouns - nominative, genitive, dative, accusative, ablative and prepositional. And, generally, word ending would be different. In Latin, if I'm not mistaken, you have only four. In English you have none, or even if they are distinguished, the word endings do not change, that's the point. Same thing for verbs. So it's of interest to see how different inflection behaviour of languages is (or is not) reflected in word entropy.
Koen G > 14-09-2019, 02:36 PM
ReneZ > 14-09-2019, 03:04 PM