After looking at the line-start markers, I examined the line-end markers.
(3,852 lines with at least 5 glyphs, excluding foldout and circular text folios)
At first, the picture was monotonous but familiar: 5 end glyphs (y, n, m, l, r) account for 89% of all line ends. At the single-glyph level, there is practically no coupling to the next line.
It only got interesting at the token level.
When considering the LAST TOKEN of a line as a whole (i.e., up to the last space), three structurally distinct classes emerge.
Class A:
Full tokens at the end of a line
Tokens that mostly occur in the middle of the text (>50% internal), but when they appear at the end of a line, show a p-onset effect with factor >= 2x global:
10 tokens with factor >= 2x. Aggregated across all 10: 156 transitions, of which 38 are p-onsets (24.36%) compared to 7.35% globally. Factor 3.31x.
These tokens are actual words in the text that sometimes happen to appear at the end of a line. When this happens, a p-onset follows clearly disproportionately.
Class B:
End-heavy tokens (>35% occurrence at the end)
Aggregated across all 11 tokens: 320 transitions, of which 9 are p-onsets (2.81%) compared to 7.35% globally. Factor 0.38x. Not a single f-onset across all 320 transitions.
Class C:
Short end tokens (2-3 glyphs) with p-onset = 0
Aggregated across all 4 tokens: 141 transitions,
of which 0 (!!!) are p-onsets . With a global expectation of 7.35%, approximately 10 p-onsets would be expected. Observed: 0.
The statistical probability that this pattern occurs by chance is practically zero.
---
What follows from these three classes:
There are three structurally distinct functions at the end of a line, with measurably different effects on the start of the next line.
Class A (full words): triggers p strongly (factor 3.31x aggregated, individual tokens up to 7.4x)
Class B (end-heavy tokens): strongly avoids p (factor 0.38x), no f
Class C (short end tokens): completely blocks p (factor 0x, 0 out of 141)
---
Methodologically important in connection with my assumption that spaces are not word boundaries:
This separation is visible ONLY at the token level, i.e., taking spaces into account. At the pure glyph level (without spaces), the findings become blurred. The last 4-5 glyphs of a line look similar, regardless of whether the last token is 2 or 5 glyphs long.
It follows that: spaces carry structural information. Their position is correlated with the sequence properties of the line; it is not random.
---
To clarify:
The VMS has a strict rule across line boundaries.
0% p-onset in 141 transitions is not a "trend"; it is a prohibition.
Such rules are not found in random pseudotext.
Spaces are not random layout markers.
If the token boundary (position of the space) triggers a hard rule, then spaces are structurally relevant.
Exactly what they are (true word boundaries or rule-based positions) remains open; what matters is: they are not without function.
The system operates at the token level, not at the glyph level.
The rule recognizes "ol as a token at the end of a line" - it does not recognize "ol as a bigram." Tokens exist as units in the VMS system.
The concentration of end glyphs into 5 values (89%) fits this pattern.
The system reduces the permissible end positions to a small number.
If spaces were word boundaries, the distribution of end glyphs would have to be broader - in Middle High German, endings are distributed across 10-11 letters.)
---
Three hypotheses regarding what the pure end markers (Classes B and C) might be:
Hypothesis 1: Sentence or paragraph end markers
They conclude a block of content. What follows is something different or shorter, not a new long block. This is consistent with the resulting avoidance of p and f.
Hypothesis 2: Line fillers
If there was still space at the end of the line, the author resorted to a standard formula. The actual content continues normally on the next line.
Hypothesis 3: Position-dependent token selection
Certain tokens are systematically selected for line-end positions, independent of preceding content. The blocking of p-onset reflects a structural rule of the writing system, not a semantic constraint.
Do u have an idea?