quimqu > 9 hours ago
(Yesterday, 11:44 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.That is perhaps unexpected, and raises the question: what is the reason that the variant word appeared at that point?
Was it because it is a variant of a previous core word?
Or was it because it is a relatively frequent word that is therefore likely to appear anyway?
The latter has to be the preferred case.
(or the text is backwards...)
nablator > 9 hours ago
(10 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.About the bursts: before this post, the burst detection followed a fairly natural idea: the burst starts with the first word and the following are considered variations of it. This is simple and works well for segmenting the text, but it has one major problem: it assumes that the first word is the origin of the others. And this does not have to be true.
quimqu > 9 hours ago
(9 hours ago)nablator Wrote: You are not allowed to view links. Register or Login to view.This is an important fact to take into account: lines were not always written sequentially from top to bottom as any normal text would be. There are many instances of gallows intrusions where the text visibly curves upward to avoid a big gallows glyph on the next line. (This is a big indication of something fishy going on by the way.) So the earlier written words on the same page don't have to be on a line above or to the left on the same line.
quimqu > 9 hours ago
nablator > 9 hours ago
Rafal > 7 hours ago
DLXXXI CLIII XXV MCCCXXX MCCCXXV CLI CXII CXXX MCCCVII I CCCXXV CLXXVII XXXVII XXVIIII LXXVIII CCCCLXXXXIII I II DCLVI MLXXXXIII MCCCXXXI CCCXXXV MCCCXXXII XVII CCCXII MCCXXXXVIII L LXXVIII CLXXXVI XVII CCCLI MCCCXXXIII XXXVII XXVIIII LXXVIII CCCCLXXXXIII I CCCLI DCLVI VII XXXVII XXVIIII LXXVIII MCCCXXXI CCCXXXV CCCLI MCCCXXXII MCCCXXXIIII MCCCXXXV XXV MCCCXXX VII CCLXXXXIIII DCXXXXVI CLIII XXV I CCCXXV CCLXXXIIII LXXXV CCLXV DCCLXXXVIIII MCCCXXXVI VII II MCCCXXXVII CXXX MCCCVII LXXVII CCLXXXIIII XVII LXXV MCCCXXVIII LXXVIII CLXXXVI MCCCXXX XXVIIII MXXXXII CLIII XXV DCCLXXX DCLXVII XXXIII LXXV DCCCCLVIIII XXXVII CIIII DCLXXVII CCLXXXXIIII MCCCXXXVI VII IIII XXII LVII MCCCXXIII CCCXXVIII LXXXV DCCCCLXXXVIIII CCCLI CCCII CCLI CCLVI CLXXXVI CCLXXIII CCLXXXVIIII CCXXVII DCCCCLXXXVIIII CLXXXXI MCCCXXXVIII CLIII CCLXXXVIIII CCXXVII MCCXXVII VII CXXXXVIIII CCLXIIII MVII CCLXXXVIIII VII LXXXX CCCLVI XXXXIIII CCCCLXXXXI CCXVIII XVII CCLXXXVIIII CCCXIII CXXXXVIIII CCLXIIII MVII CCLXXXVIIII VII CCLXXXXVI CLIII XXV XXXXIIII CCC XVII DCCCCXXXVI MCXXXXIII XVII DCCCCLVIIII CLIII XXV XVII MCCCXXXVIIII CCCXXIIII MCCCXXIII MCLXVIIII XV LXXV XVI VII MCCCXXXX VII XXII I LXXV DC CLIII XXXXIIII MCCCVII XXV CCCCLXXXXIII LVII CXXXXII XXVIIII LXXVIII CCLXV DXXXXIIII DXXXXVI MCCCXXXXI VII CLIII MCCCXXXVIII XXVIIII LXXVIII DLVIII DXXXXVI DLXXXI MCCCXXXXII VII MCCCXXIII XXII LVII IIII MCCCXXXXIII XXVIIII MCCCVIII MLXV CCCCII CLXXXIIII MCCCXXXXIIII VII IIII XXII MCCCXXXVIII CCCLI CCCII CLIII MLXXX CCCLVI XXXXIIII CCCCLXXXXI MCCCXXXXV VII CCLI CCLVI CCLXXIII LXXV CCXXVII MCCCXXXXVI VII CXXXXVIIII CCLXIIII DCXXXXV CCLXXXXIIII DCXXXXVI CCCXXXV CXXXXII LXXXV CCLXV DCCLXXXVIIII DCXXXXVI VII CCCXXXV LXXV LXXI LXXIIII CCLXVIIIquimqu > 6 hours ago
Model | Log-loss | Perplexity
---------------------------------------------
Independent | 1.19 | 3.29
Markov (order 1) | 1.07 | 2.91
Markov (order 2) | 1.01 | 2.73Feature | Importance
---------------------------
line_ord | 0.287
prev_type | 0.283
prev2_type | 0.255
pos_norm | 0.097
pos_in_line | 0.073
is_line_start | 0.004Metric | Value
----------------
Accuracy | 0.556
ROC AUC | 0.574
Log-loss | 0.684quimqu > 6 hours ago
(6 hours ago)quimqu Wrote: You are not allowed to view links. Register or Login to view.On the other hand, the beginning of the line does not carry much weight.
quimqu > 5 hours ago
Pattern class | Mean share
same | 0.514
up_jump | 0.181
down_jump | 0.181
up_1 | 0.087
down_1 | 0.084Pattern class | Mean share
flat_3 | 0.315
nondecreasing | 0.201
nonincreasing | 0.200
aba | 0.201
(I will reply if you have comments but for today the coding is finished)