Somehow I got sucked into this question. After a 'quick' shadow-calculation of results from the thread, I had the impression that the distribution of
sh-initial words in paragraphs is qualitatively different than that of
chdy-not-preceded-by-
p. Namely, the former favors the literal first line of the paragraph (like
p itself), while the latter shows a general upward trend. Now I am not so sure, but here is another set of plots.
The issue can be muddled by the preponderance of short paragraphs. Of all paragraphs tagged as such in IT2a-n.txt, about half are less than 5 lines long. Let us consider those that are 5 lines or longer.
In the maps below, the coordinates are line number (rows) and ordinal EVA character (columns); 355 paragraphs have been aligned and stacked together. On the left panel, each cell represents the number of
sh-initial words beginning at that point. This absolute measure fades to the right and down with the fading population of longer lines and paragraphs. Dividing by the total number of words that begin at each cell gives the fractional map on the right. This density is flat-ish across the paragraph, but necessarily shows greater statistical scatter at the margins:
The greater concentration of
sh-initial words in the first line
specifically is visible.
A discrete break is not visible in the (noisier) density of
chdy-not-preceded-by-
p:
In the upper portion, where all paragraphs are represented, there is a plausible concentration toward the middle of the line, as seen in earlier replies.
Comparing cumulative fractions in successive lines is suggestive...
...and deceptive, thanks to the much smaller population of
chdy. Here, again, a statistical test might be necessary.
Finally, the same data in yet another implementation of R/D coordinates:
All together a fair illustration of how R/D coordinates can reveal some patterns, but literally blur line-specific ones.
Hopefully any remaining errors and inconsistencies are negligible.