quimqu > 15-04-2026, 08:25 AM
| Section | Real AUC mean | Real AUC sd | Null AUC mean | Null AUC sd |
|---|---|---|---|---|
| Biological (balneological) | 0.891 | 0.057 | 0.459 | 0.119 |
| Herbal | 0.905 | 0.036 | 0.461 | 0.131 |
| Section | End-quality AUC | Start-quality AUC | Fit-quality AUC |
|---|---|---|---|
| Biological (balneological) | 0.708 | 0.763 | 0.797 |
| Herbal | 0.752 | 0.763 | 0.780 |
| Section | Boundary end quality | Interior end quality | Boundary start quality | Interior start quality | Boundary fit quality | Interior fit quality |
|---|---|---|---|---|---|---|
| Biological (balneological) | -2.019 | -2.564 | -1.774 | -2.508 | -1.531 | -2.531 |
| Herbal | -1.788 | -2.349 | -1.798 | -2.444 | -1.453 | -2.663 |
| Training | Test section | Boundary AUC | End AUC | Start AUC | Fit AUC |
|---|---|---|---|---|---|
| Herbal | Biological (balneological) | 0.804 | 0.667 | 0.690 | 0.753 |
| Biological (balneological) | Herbal | 0.809 | 0.627 | 0.659 | 0.785 |
| Training | Test section | Boundary AUC |
|---|---|---|
| Herbal + Biological | Marginal stars only | 0.958 |
| Herbal + Biological | Text-only | 0.859 |
| Herbal + Biological | Pharmaceutical | 0.782 |
| Herbal + Biological | Zodiac | 0.746 |
| Herbal + Biological | Astronomical | 0.739 |
| Herbal + Biological | Cosmological | 0.727 |
| Section | Good line endings tend to look like | Good line openings tend to look like |
|---|---|---|
| Biological | ol..ly, ol..dy, or..ry, ol..ol, ok..ky, some qo... tails | so..dy, so..ey, so..or, so..ol, sa..ar, sa..in, ds..dy, dc..dy, tc..dy |
| Herbal | ..am, ..dy, da..an, da..am, ok..am, ot..am, ch..ry, some da..in tails | yc..or, yc..ol, yc..ey, so..in, so..or, so..ol, dc..ey, dc..dy, ds..dy, yt..dy, tc..in |
Jorge_Stolfi > 15-04-2026, 12:33 PM
(15-04-2026, 08:25 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view. The null control shows that this does not survive random relabelling. So the effect looks real. I think the safest formulation is this: in Herbal and Biological, line breaks are statistically structured rather than arbitrary.
quimqu > 15-04-2026, 03:02 PM
(15-04-2026, 12:33 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.So, before we can take your results above as evidence of LAAFU, you need to repeat the analysis on text that has been re-justified as discussed before. Namely, discard the first line of each parag, join the remaining lines in a single token stream, and feed that to the trivial line-breaking algorithm, with maximum line length set to about 62% or 162% of the original average line length -- always counting characters, not words.
The set of these re-justified parags should be the "null control". I am quite sure that you will see on this control text the same kind of anomalies that you see on the VMS. The question is only whether they will be just as strong, or significantly weaker.
And even if the anomalies in this control text are weaker than on the original parags, that is still not yet evidence of LAAFU. Because the algorithm used by the scribe is a bit more complicated than the trivial one, and the additional complications add to the line-break anomalies. So we will have to try to simulate these complications too.
| Section | Dataset | AUC total | End AUC | Start AUC |
|---|---|---|---|---|
| Biological | Real | 0.875 | 0.743 | 0.749 |
| Biological | Rejust 0.62 | 0.555 | 0.519 | 0.530 |
| Biological | Rejust 1.00 | 0.631 | 0.515 | 0.555 |
| Biological | Rejust 1.62 | 0.679 | 0.527 | 0.552 |
| Herbal | Real | 0.885 | 0.712 | 0.756 |
| Herbal | Rejust 0.62 | 0.731 | 0.519 | 0.536 |
| Herbal | Rejust 1.00 | 0.742 | 0.526 | 0.554 |
| Herbal | Rejust 1.62 | 0.642 | 0.544 | 0.563 |
| Section | Control | Real − control AUC |
|---|---|---|
| Biological | 0.62× | +0.32 |
| Biological | 1.00× | +0.24 |
| Biological | 1.62× | +0.20 |
| Herbal | 0.62× | +0.15 |
| Herbal | 1.00× | +0.14 |
| Herbal | 1.62× | +0.24 |
quimqu > 15-04-2026, 03:12 PM
Fontanellean > 15-04-2026, 05:34 PM
(15-04-2026, 03:12 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Line breaks in the Voynich seem to follow two constraints at the same time. Lines tend to fill the available space, but they also tend to end with certain types of word patterns and start with other types of patterns. The model is picking up both things.
This is not typical of ordinary prose, where line breaks are mostly arbitrary. It suggests that the text was produced with local structural preferences, not just written and then wrapped to fit the page.
That does not tell us exactly how the text was generated. It could be stylistic, procedural, or something else. But it does suggest that line breaks are part of the structure, not just a formatting afterthought.
quimqu > 15-04-2026, 07:07 PM
| Section | Group | End | Start | Total | Left H | Right H |
|---|---|---|---|---|---|---|
| Herbal | good | 1.91 | 1.27 | 3.18 | 1.15 | 1.95 |
| Herbal | weak | 0.00 | 0.00 | 0.00 | 3.37 | 3.37 |
| Herbal | interior | 0.27 | 0.31 | 0.57 | 6.24 | 6.19 |
| Biological | good | 2.20 | 2.37 | 4.57 | 2.39 | 2.31 |
| Biological | weak | 0.00 | 0.00 | 0.00 | 3.09 | 3.03 |
| Biological | interior | 0.84 | 0.87 | 1.71 | 5.38 | 5.38 |
Jorge_Stolfi > 16-04-2026, 04:56 AM
(15-04-2026, 07:07 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.So it seems that two things are happening at the same time. Lines fill the available space, but they also tend to end and start with specific families of patterns.
quimqu > 16-04-2026, 09:09 AM
nablator > 16-04-2026, 12:42 PM
(16-04-2026, 09:09 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.These end-like tokens tend to belong to specific families and shapes that are well known at line ends in EVA, for example patterns like -dy, -y, -l, -r, or short forms such as dy, dal, lo, which frequently appear in final position and have high end-like scores.
Stolfi Wrote:By the way: there are many pages where the lines are interrupted by plants. An intrusion would trigger the line breaking algorithm just as if it was the right rail. That is, the line-end anomalies should be observed on the last few words before the intrusion, and the line-start anomalies should be observed on the first word after it. I vaguely recall this having been tested and found to be true. Is t?
nablator > 16-04-2026, 01:19 PM
(15-04-2026, 07:07 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.So it seems that not all line breaks behave the same. There is a subset that fits very well the “good ending + good beginning” pattern, and another subset that barely fits it.