15-09-2025, 04:38 PM
When looking at the Voynich text it helps to separate different kinds of lines instead of treating the whole corpus as one block. A line at the start of a paragraph does not behave like a line in the middle, and a label line does not look like a body line. By checking the very first and last characters of each line, and comparing them with what is normal for the overall corpus, some clear patterns appear. Paragraph beginnings have their own favorite starters, body lines are more balanced, and lines outside of paragraphs follow yet another rule. What follows is a summary of these differences, line type by line type.
Labels
These short label lines behave differently from running text.
Paragraph-initial lines
This is where the strongest cue lives.
Body lines (within paragraphs)
These are ordinary running lines within the paragraph (not the first line, nor the last line).
Last lines of paragraphs
The group of last paragraph lines, broadly similar to the body lines.
Non-paragraph lines (outside any paragraph)
Standalone or detached lines show a different entry behavior.
The body of the text (lines within and ending paragraphs), taken together, makes up about 72% of all words. Its behaviour is fairly steady: lines usually begin with d, s, y, q, or o, and they usually end with y, n, m, l, or r. Some small rules repeat across the text, such as d being followed by a, c, s, or o at the start of words, and l being followed by a or o at the very end of a line. These habits stay in place regardless of which type of body line you look at.
Looking at the manuscript as a whole, the sharpest divide appears right at the first character of each line. Lines that begin a paragraph usually open with a small set of markers such as p, t, k, or f. Lines that stand outside any paragraph, by contrast, almost always start with o. Once the line is underway and you move inside the word, the contrasts between line types are still there, but they are far less pronounced than the jolt that comes at the very first step.
Labels
These short label lines behave differently from running text.
- Starts: They very often begin with o (about 50%). Compared with the overall corpus, that’s a strong tilt toward o. Against their own local word mix it’s only mildly unusual, but versus the full corpus it stands out.
- Ends: They tend to end in y, with smaller bumps for m and d.
- Sumary: Labels have an o- opening habit and a fairly y-heavy ending, but otherwise don’t diverge wildly from their own local vocabulary.
Paragraph-initial lines
This is where the strongest cue lives.
- Starts: Paragraph-initial lines strongly favor p, t, k, and f. Those openings are over-represented both against their own local word stock and against the whole corpus. By contrast, starts like o, q, c, s are under-represented here.
- Ends: Endings don’t separate these lines much, with the clear exception that m is noticeably more common at the very end.
- Distance: The divergence at the first character is high (JS_init ≈ 0.38 vs local, 0.52 vs global) (The very first character of these lines is much less like the rest).
- Summary: If you see a line starting p, t, k, f odds are it’s the first line of a paragraph. This entry code is the sharpest positional signal in the manuscript.
Body lines (within paragraphs)
These are ordinary running lines within the paragraph (not the first line, nor the last line).
- Starts: A mixed, stable recipe: d, s, y, q, o dominate, with c notably lower than in the general word stock.
- Ends: y is slightly lower than global norms, while m at line end is clearly higher than we’d expect from the local body vocabulary.
- Distance: Starts diverge modestly (JS_init ~ 0.12); ends are low (~0.066).
- Summary: This is the baseline flow of the text: predictable starts and the familiar y, n, l, r, m mix at the end, with m a recurring tail.
Last lines of paragraphs
The group of last paragraph lines, broadly similar to the body lines.
- Starts: Again y, s, d, o, q; c is much lower than its local/global baselines.
- Ends: y and n are higher; l and r are lower. Log-odds confirm m, y, g are over-represented at the very end, while l, r, o are under.
- Distance: Modest at the start (JS_init ~ 0.14), low at the end (~0.048).
- Summary: Another "normal text" profile; very close to the first body set (the intra paragraphs lines), with slightly stronger y at both ends.
Non-paragraph lines (outside any paragraph)
Standalone or detached lines show a different entry behavior.
- Starts: A very strong bias to o- (about 66% of line starts). Against the whole corpus this is striking, though against their own local word mix it’s milder.
- Ends: y is high (≈ 47%), and m is also above baseline.
- Distance: Start vs global is notable (JS_init ≈ 0.21), while start vs. local and ends are relatively low.
- Summary: These lines have an o-anchor at the very start; the second character contributes far less than the first.
The body of the text (lines within and ending paragraphs), taken together, makes up about 72% of all words. Its behaviour is fairly steady: lines usually begin with d, s, y, q, or o, and they usually end with y, n, m, l, or r. Some small rules repeat across the text, such as d being followed by a, c, s, or o at the start of words, and l being followed by a or o at the very end of a line. These habits stay in place regardless of which type of body line you look at.
Looking at the manuscript as a whole, the sharpest divide appears right at the first character of each line. Lines that begin a paragraph usually open with a small set of markers such as p, t, k, or f. Lines that stand outside any paragraph, by contrast, almost always start with o. Once the line is underway and you move inside the word, the contrasts between line types are still there, but they are far less pronounced than the jolt that comes at the very first step.