dashstofsk > 10-05-2026, 08:52 AM
(09-05-2026, 12:44 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.initial d word types are more common the closer you get to Line End
MarcoP > 10-05-2026, 09:39 AM
(09-05-2026, 12:44 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.the line start word appears to be impacted in many cases by the word above it
JoJo_Jost > 10-05-2026, 10:16 AM
JoJo_Jost > 10-05-2026, 10:27 AM
Stefan Wirtz_2 > 10-05-2026, 01:04 PM
(10-05-2026, 06:14 AM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.Diane O'Donovan of course
MarcoP > 11-05-2026, 07:37 AM
Patrick Feaster Wrote:When it comes to explaining distribution patterns, there are various possibilities we might entertain. One is that each line of text corresponds to some unit of meaningfully patterned content, such as a grammatical sentence, a line of poetry, or an entry in a list. Exploratory studies of a few well-known works of poetry show that similar patterns can be detected in them, presumably due to a complex interplay of grammatical, metrical, and stylistic factors. In Virgil’s Aeneid, for example, if we compare homologous word pair sets ending [es] and [ibus], the [es] set has a midline average rightwardness score of 0.399 with 343 tokens, while the [ibus] set scores 0.708 with 390 tokens—a difference as stark as any presented above. Alternatively, we might hypothesize that distribution patterns arose as a byproduct of some method of encoding meaningful content rather than from the content itself. Here I’ll cite just one representative scenario. Fifteenth-century ciphers often sought to increase security by providing multiple options for encoding each plaintext character, and for this ploy to work as intended, a writer needed to alternate repeatedly among those options. One strategy for ensuring that happened would have been to favor different options in different areas of the page. Thus, there’s more than one angle from which we could try to explain distribution patterns, but the methods outlined above for identifying such patterns should be equally applicable to any and all prospective interpretations of them.
JoJo_Jost > 11-05-2026, 07:57 AM
Jorge_Stolfi > 12-05-2026, 08:02 AM
JoJo_Jost > 12-05-2026, 08:38 AM
(12-05-2026, 08:02 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In conclusion: if you must do statistical analysis of the text, don't focus on characters, focus on words. And if you need to merge words into word classes (say, to reduce the volume of the results or the sampling noise), try to use classes that are defined by some semantic criterion (like co-occurrence), not by the characters that occur in them (like "qo-words" or "words with gallows").
ReneZ > 12-05-2026, 09:12 AM