Identifying paragraphs in the Starred Parags section - Jorge_Stolfi - 27-06-2025
I am trying to figure out the paragraph breaks in the Starred Parags (aka Recipes) section.
I will use these terms:- parag: short for paragraph.
- head of a parag: its first line.
- tail of a parag: its last line,
- puff: a one-legged gallows, either {p} or {f}, with or without the platform slash.
- margin: the mostly text-free space between an edge of the page and the text.
- left rail: the ideal mostly vertical and straight line that runs just to the left of the majority of lines of a page, separating the left margin from the text.
- right rail: the ideal mostly vertical and possibly wavy but fairly smooth line that runs just to the right of the ends of most lines of a page, separating the text from the right margin.
- long line: a text line that starts at the left rail and ends at or beyond the right rail.
- short line: a text line that starts at the left rail but ends well before the right rail.
- baseline: the ideal usually smooth curved line that runs just below the glyphs of a text line, excluding the tails of {y}, {m}, {l}. etc..
- linegap: the vertical distance between baselines of successive lines; which often varies over the width of the text.
- wider linegap: a line gap that is wider than normal, at least in some part of the lines (e.g. left side, right side, or middle).
- topline: an ideal line parallet to the baseline, such that the distance between the two is the height of an EVA {o} in the line's handwriting.
- midline: an ideal line parallel to the baseline and the topline, equdistant from the two.
- starlet: a star in the margin that has been assigned to a unique line, like a bullet in an item list.
The posiitions and even the count of stars in each page are not reliable, since they sometimes do not match the obvious paragraph breaks. Thus the assignment of starlets to lines is to be determined as part of identifying the parag breaks. However, I will assume that every starlet should be assigned to a different line.
That saiid, a paragraph should ideally be a bunch of consecutive lines with all of the following properties:- P1. The first of these lines follows a short line (or is the first line in the SPS, or follows a "title");
- P2. The last of these lines is short (or is the last line of the SPS, or precedes a "title").
- P3. All lines other than the last one are long lines.
- P4. There are no puffs in any of these lines except possibly in the first of them.
- P5. The first of those lines has an assigned starlet.
- P6. None of these lines, except the first one, has an assigned starlet.
I will call a set of lines with all these properties a perfect parag. I will assume that they are indeed paragraphs as intended by the Author.
The following table gives some relevant statistics per page, with a tentative assignment of starlets:- Stars: Number of stars in the page.
- ShLns: Number of short lines in the page
- Puffd: Number of lines that contain puffs (one-leg gallows).
- PerfP: Number of perfect parags in the page.
Code: page ! Stars ! ShLns ! Puffd ! PerfP
------+-------+-------+-------+-------
f103r | 19 | 18 | 14 | 15
f103v | 14 | 12 | 14 | 9
f104r | 13 | 13 | 13 | 13
f104v | 13 | 13 | 8 | 11
f105r | 10 | 11 | 15 | 6
f105v | 10 | 14 | 20 | 3
f106r | 16 | 15 | 17 | 13
f106v | 14 | 16 | 16 | 14
f107r | 15 | 15 | 13 | 10
f107v | 15 | 15 | 13 | 14
f108r | 16 | 17 | 13 | 8
f108v | 16 | 5 | 8 | 1
f111r | 17 | 10 | 7 | 4
f111v | 19 | 8 | 11 | 6
f112r | 12 | 11 | 13 | 8
f112v | 13 | 15 | 14 | 12
f113r | 16 | 16 | 17 | 12
f113v | 15 | 15 | 16 | 15
f114r | 13 | 11 | 13 | 11
f114v | 12 | 11 | 12 | 9
f115r | 13 | 13 | 12 | 12
f115v | 13 | 13 | 12 | 12
f116r | 10 | 8 | 10 | 5
------+-------+-------+-------+-------
TOTAL | 324 | 295 | 301 | 223
As it can be seen, on page You are not allowed to view links. Register or Login to view. the counts of stars, short lines, and puffed lines match and the whole texts consists of perfect parags. On other pages there are lines which cannot be placed in perfect parags. I will have to compromise on one or more of the criteria above. Stay tuned...
RE: Identifying paragraphs in the Starred Parags section - ReneZ - 28-06-2025
The correlation between short lines and, errm, puffs, is generally quite good, which is both reassuring and helpful.
I recently updated my counts on these pages here: You are not allowed to view links. Register or Login to view.
but that should just be for reference. I am not too convinced of those myself.
RE: Identifying paragraphs in the Starred Parags section - Jorge_Stolfi - 29-06-2025
I noticed this curious detail on page f103v:
I take this as another bit of evidence that the puffs (one-leg gallows p f) on most lines were created by the Scribe as transmutations of other letters, in order to mark those lines as parag heads.
As I see it, that was the case for most puffs, at least in this section, but not all. There are a handful of lines with puffs, scattered over several pages, that cannot plausibly be considered parag heads, because they would break several of the criteria above -- in particular, if they are considered heads, there will be more parags than stars on those pages.
This would be similar to our present practice of capitalizing most words in article and section titles: it is a major use of capital letters, but not the only one.
By the way, IIRC in French, when words are capitalized in titles and such, it was (is?) custom to omit all diacritics, for aesthetics or typesetting reasons. So "école" would become "Ecole", not "École". Note that this practice would cause the digram statistics of "E" to be different from those of both "é" and "e"...
All the best, --jorge
|