quimqu > Yesterday, 05:59 PM
(Yesterday, 05:51 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 05:19 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.But doing line by line as I did, its mean is negative (about -0.07).
I get 0.169 for word bigrams all in the same line (positive-negative)/(positive+negative). I used all the lines (including labels) of the old Takahashi transliteration (ivtff_v0a).
nablator > Yesterday, 06:43 PM
(Yesterday, 05:59 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.No, what I did is entire words.
quimqu > Yesterday, 06:53 PM
(Yesterday, 06:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I don't understand, sorry.
nablator > Yesterday, 07:03 PM
(Yesterday, 06:53 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.(Yesterday, 06:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I don't understand, sorry.
Hi nablator,
To clarify, I did not calculate bigrams across words. My goal was to compute autocorrelation, but only within each line. That is, every line is treated independently, and the autocorrelation measures patterns of word lengths inside that line, never spanning to the next line.
This way, the result reflects line-level structure, not cross-line sequences. But always with entire words, not n-grams
Jorge_Stolfi > Yesterday, 07:06 PM
quimqu > Yesterday, 07:37 PM
(Yesterday, 07:03 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I think we mean the same but use different words.
By bigram I mean word bigram, not character bigram. The autocorrelation (positive or negative) is a property of a word bigram. So I am counting how many of these bigrams (sequences of two words on the same line, no word bigram across lines) are positively autocorrelated = p and how many are negatively correlated = n. The result is (p-n)/(p+n) = 0.169
(Yesterday, 07:06 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.One complication for this sort of analysis is that in the available transcriptions there are many uncertain spaces (EVA comma) that may or may not be word breaks. And there are many spaces in the text that are wider than normal inter-glyph spaces but narrower than normal inter-word spaces, which transcribers may have either entered as word breaks (EVA period) or just ignored.
Many of these dubious word breaks are after a short prefix like y or ol or before a short suffix like dy or ar. If those dubious spaces were treated as word breaks, the length correlation would probably drop, possibly even becoming negative.
All the best, --jorge
RobGea > Yesterday, 08:06 PM
quimqu > Yesterday, 10:07 PM
Text | Global (within+cross) | Within-line | Cross-line | Weighted mean (per line) |
---|---|---|---|---|
Voynich (EVA) | +0.16 | –0.07 | +0.10 | –0.03 |
Timm (generated) | +0.02 | +0.03 | +0.02 | –0.07 |
Platonis Apologia (Latin) | –0.09 | –0.09 | –0.08 | –0.17 |
Unfortunate Traveller (English) | –0.11 | –0.11 | –0.12 | –0.21 |
Lazarillo de Tormes (Spanish) | –0.19 | –0.19 | –0.19 | –0.27 |
quimqu > Yesterday, 10:19 PM
(Yesterday, 08:06 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.We have line-level (negative) and whole-corpus (positive) autocorrelation so what about Paragraph level ?
Jorge_Stolfi > Yesterday, 11:43 PM
(Yesterday, 10:19 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.in natural languages