RenegadeHealer > 05-08-2021, 06:32 PM
nickpelling > 05-08-2021, 07:56 PM
MarcoP > 16-08-2021, 07:24 AM
Patrick Feaster Wrote:Measurement of average similarity between words n and n+x for x in the range 1-20. Blue: comma breaks disregarded (= fewer but longer words). Orange: comma breaks treated as real word breaks (= more but shorter words).
...
There’s a sharp peak at n+2, which shows that horizontal pairs of words separated by one intervening word are actually more similar on average than words that are immediately next to each other. This is followed by a periodic rise and fall at an interval close to the average length of a line. Such a period would be consistent with vertically patterned word similarity within lines
Emma May Smith > 16-08-2021, 05:01 PM
MarcoP > 17-08-2021, 01:28 PM
(16-08-2021, 05:01 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.On your purple and light blue lines: is it that there are two patterns? Line initial words are similar to one another, which causes the 9/18 and 10/19 rises, while line-internal words are similar to each another but dissimilar to line initial? This by removing the line initial and final words, the longer repeat disappears and the main N+2 is strengthened because they make up a greater ratio of all words?
nablator > 17-08-2021, 04:13 PM
MarcoP > 17-08-2021, 05:36 PM
(17-08-2021, 04:13 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Hello MarcoP,
I'm wondering how much of the downward slope can be ascribed to the inconsistencies in vocabulary between pages, most of it I guess. Without it the peak at +2 would then be only a bias against "stuttering", which is normal in natural languages.
RenegadeHealer > 18-08-2021, 03:37 PM
pfeaster > 19-08-2021, 12:54 PM
(18-08-2021, 03:37 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.Some researcher, I can't recall who, had a blog post comparing the statistical occurrence of both vords and the glyphs comprising them, in line-initial, line-final, mid-line, and label positions. The statistical patterns strongly supported the idea that these are four distinct populations of data, each with its own unique set of preferences and disinclinations that's not much like any of the other three.I'm not sure whether it's what you have in mind, but I made an argument something like this in section four of the "Ruminations" post to which Emma linked, including a comparative chart of beginning and ending glyphs: