(12-04-2026, 02:30 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view. (12-04-2026, 10:18 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Well, I am not in the stage of creating a therory and testing it, I really have no clues. I am in the stage of observation, also part of the scientific method.
But you don't have to believe the theory. In fact, the scientific method works best when you set out to check a theory that you don't. Because then you try harder to find a good test that will disprove it.
Like, someone who believes in the LAAFU theory will naturally try to compute more statistics that reveal anomalies at the two extremities of the lines. But someone who do not believe in the theory would think first of doing tests that show that the apparent anomalies are due to other causes. Like the line-breaking bias. Hence my proposed test of redoing the line breaks with different margin widths.
Thorsten recently claimed that the change from "Language A" to "Language B" is gradual, and is due to the drift that one expects from the copy-and-mutate method. Someone who believes that claim may, for instance, look for ways to rearrange the bifolios of Herbal-A and/or Herbal-B so as to make successive pages maximally similar to each other. And I bet that there is indeed a rearrangement that makes the transition more gradual than the current one. But since I don't believe that theory, I am more interested in tests that could disprove it, if it is false. Any idea? Could your kind of analysis do that?
All the best, --stolfi
Dear Jorge,
I ran the below with Claude and ran my "failure protocol" which hopefully picked up any obvious issues, but I'm being transparent here and curious to see what's actually broken if anything. I expect Quimpu is using GPT/Claude as well, and given our discussion on the other thread, I remain curious if this can add value.
The main tables below use ZLZI. A robustness section at the end shows that every finding replicates across all three transcriptions. All measurements use raw EVA characters. No theory or grammar is assumed — just character identity, word boundaries, and line boundaries.
I use six thematic sections (Botanical, Astrological, Balneological, Rosettes, Pharmaceutical, Stars) and, separately, five scribes identified by Lisa Fagin Davis. The Currier A/B dialect distinction is tagged at the folio level by scribe, not by section.
## Test 1: Re-breaking the lines
You suggested removing the real line breaks and re-breaking the text at different widths. If line-position effects are genuine production signals, they should disappear when the breaks move. If they are artefacts of margin alignment, they should reappear at any break point.
I preserved folio boundaries (so no token crosses a folio edge) and measured four things at each break width:
- **Gallows-initial**: first character of the first word is k, t, p, or f (note: the character-level analysis below shows this is driven by p and t; k is slightly depleted at line start)
- **Hapax-initial**: first word appears only once in the corpus
- **-m final**: last word on the line ends in the character m
- **AC reset**: Pearson correlation of consecutive word lengths, measured separately within lines and across line boundaries
### Results
| Line breaks | Gal. init | Gal. else | Hap. init | Hap. else | -m final | -m penult | AC within | AC cross |
|---|---|---|---|---|---|---|---|---|
| **Real lines** | **20.5%** | **6.0%** | **21.5%** | **12.2%** | **14.6%** | **1.7%** | **0.173** | **0.051** |
| Width 4 | 9.1% | 7.3% | 13.9% | 13.0% | 2.6% | 2.3% | 0.156 | 0.147 |
| Width 6 | 9.9% | 7.3% | 14.6% | 13.0% | 2.9% | 2.4% | 0.151 | 0.167 |
| Width 8 | 10.9% | 7.3% | 14.6% | 13.1% | 2.8% | 2.4% | 0.155 | 0.141 |
| Width 10 | 11.1% | 7.4% | 14.3% | 13.1% | 3.3% | 2.4% | 0.151 | 0.172 |
| Width 12 | 12.1% | 7.3% | 15.8% | 13.0% | 3.5% | 2.3% | 0.154 | 0.150 |
| Width 15 | 13.0% | 7.4% | 14.9% | 13.1% | 4.0% | 2.3% | 0.153 | 0.158 |
| Width 20 | 15.5% | 7.3% | 16.2% | 13.1% | 3.7% | 2.1% | 0.152 | 0.171 |
Every effect weakens or vanishes when line breaks are moved:
- Gallows-initial drops from a 3.4× ratio (20.5% vs 6.0%) to roughly 1.3–1.5× under re-breaking
- Hapax-initial drops from 1.8× to roughly 1.1×
- -m final drops from 8.6× (14.6% vs 1.7%) to roughly 1.2×
- The AC gap (0.173 within vs 0.051 across) closes to roughly 0.15 both ways
A residual gallows-initial effect persists at wider re-break widths (particularly width 20: 15.5% vs 7.3%). Only 10.6% of width-20 boundaries coincide with real line boundaries, which is insufficient to explain the residual (the expected rate from coincidence alone would be ~9%). The residual may reflect local clustering of gallows-initial tokens near real line boundaries, but I have not fully accounted for it. The key finding is the dramatic collapse of all four effects at narrower widths, where coincidence with real boundaries is negligible.
These effects are tied to the real line breaks. Lines are functional units, not word-wrap.
### Which characters drive the effects?
**Line end:**
| Final char | Line-end % | Elsewhere % | Ratio | Count |
|---|---|---|---|---|
| g | 1.7% | 0.1% | 21.9× | 74 |
| m | 14.6% | 0.9% | 15.8× | 636 |
| d | 2.5% | 1.5% | 1.7× | 110 |
| y | 39.8% | 40.8% | 1.0× | 1,738 |
| n | 15.1% | 16.2% | 0.9× | 660 |
| l | 11.3% | 16.2% | 0.7× | 493 |
| r | 9.3% | 15.7% | 0.6× | 405 |
| o | 1.4% | 3.7% | 0.4× | 61 |
**Line start:**
| Initial char | Line-start % | Elsewhere % | Ratio |
|---|---|---|---|
| p | 8.5% | 0.5% | 17.8× |
| t | 8.9% | 1.8% | 4.8× |
| y | 14.1% | 3.8% | 3.7× |
| f | 0.8% | 0.2% | 3.1× |
| d | 15.1% | 8.7% | 1.7× |
| k | 2.3% | 3.5% | 0.7× |
| c | 4.5% | 20.4% | 0.2× |
| a | 0.5% | 6.2% | 0.1× |
The "gallows-initial" effect is driven by p (17.8×) and t (4.8×). The character k is slightly depleted (0.7×). Line-end enrichment is concentrated in -m and -g, while -l, -r, and -o are actively depleted.
---
## Test 1b: LAAFU by scribe and by thematic section
### By Davis scribe
| Scribe | Lines | Tokens | Gal. init | Gal. else | Ratio | -m fin | AC w | AC c |
|---|---|---|---|---|---|---|---|---|
| S1 (Dialect A) | 1,489 | 10,448 | 17.9% | 7.3% | 2.5× | 9.7% | 0.152 | 0.021 |
| S2 (Dialect B) | 1,101 | 9,501 | 19.8% | 4.9% | 4.0× | 11.2% | 0.154 | 0.037 |
| S3 (Dialect B*) | 1,232 | 12,007 | 27.1% | 6.1% | 4.4× | 22.6% | 0.181 | 0.083 |
| S4 | 449 | 3,871 | 11.6% | 4.9% | 2.4× | 16.9% | 0.157 | 0.059 |
| S5 | 95 | 842 | 25.3% | 7.9% | 3.2× | 15.8% | 0.151 | 0.082 |
All five scribes show the LAAFU pattern: gallows enriched at line start (2.4–4.4×), -m enriched at line end, AC reset at boundaries. The production habit is shared regardless of dialect.
Scribe assignments are Davis's (2020, preliminary). Currier classified Scribe 1 as Dialect A, Scribes 2 and 3 as Dialect B (Scribe 3 writes the Stars section, which Currier called "modified B"). Scribes 4 and 5 were not separately identified by Currier.
### By thematic section
| Section | Lines | Gal. init | Gal. else | Ratio | -m fin | AC w | AC c |
|---|---|---|---|---|---|---|---|
| Botanical | 1,748 | 21.2% | 7.9% | 2.7× | 12.0% | 0.154 | 0.038 |
| Astrological | 320 | 6.6% | 3.7% | 1.8× | 12.2% | 0.154 | 0.089 |
| Balneological | 789 | 15.3% | 4.1% | 3.7× | 7.5% | 0.147 | 0.026 |
| Rosettes | 187 | 24.6% | 6.4% | 3.8× | 28.9% | 0.159 | 0.028 |
| Pharmaceutical | 238 | 20.6% | 3.5% | 5.9× | 13.0% | 0.146 | 0.000 |
| Stars | 1,084 | 26.5% | 6.1% | 4.3× | 22.5% | 0.188 | 0.077 |
The Astrological section has the weakest gallows-initial effect (6.6% vs 3.7%, ratio 1.8×) — consistent with its unusual circular/radial layout. All other sections show ratios of 2.7× or higher.
---
## Test 2: Vocabulary distance by scribe
I measured pairwise vocabulary distance (Jensen-Shannon divergence, log base 2) at the folio level, grouped by Davis scribe. Folios with fewer than 10 tokens were excluded.
| Comparison | Mean JSD | Folios / pairs |
|---|---|---|
| Within Scribe 1 | 0.802 | 113 fol / 6,328 pairs |
| Within Scribe 2 | 0.686 | 42 fol / 861 pairs |
| Within Scribe 3 | 0.695 | 32 fol / 496 pairs |
| Cross S1 ↔ S2 | 0.860 | 4,746 pairs |
| Cross S1 ↔ S3 | 0.861 | 3,616 pairs |
| Cross S2 ↔ S3 | 0.719 | 1,344 pairs |
Scribe 1 (Dialect A) is equally distant from Scribe 2 and Scribe 3 (0.860 vs 0.861). Scribes 2 and 3 (both Dialect B variants) are much closer to each other (0.719) than either is to Scribe 1. Within-S1 variance is higher (0.802) because Scribe 1 spans multiple thematic sections (Botanical + Pharmaceutical), while Scribes 2 and 3 are each concentrated in one section.
### A-ness by scribe
A-ness = distance to S2 centroid / (distance to S1 centroid + distance to S2 centroid). Higher = more A-like:
| Scribe | A-ness | Folios |
|---|---|---|
| S1 (Dialect A) | 0.522 ± 0.007 | 113 |
| S4 | 0.496 ± 0.009 | 30 |
| S5 | 0.482 ± 0.011 | 7 |
| S3 (Dialect B*) | 0.470 ± 0.013 | 32 |
| S2 (Dialect B) | 0.454 ± 0.017 | 42 |
Scribe 4 (Currier's "Astrological, mostly A") sits near the midpoint (0.496). Currier classified this section as "mostly A," but the overall vocabulary profile is intermediate between Dialects A and B. This may reflect that Currier's A/B distinction was based on specific features (frequency of particular symbol groups, unattached finals) rather than overall vocabulary distance.
### Drift within Scribe 1
If dialect drift is real, later Scribe 1 folios should be closer to Dialect B than earlier ones. Scribe 1 covers 112 folios: 96 Botanical (f1–f56) and 16 Pharmaceutical (f88–f102).
| Sample | Folios | r (position vs dist to S2) | Critical r (p=0.05) |
|---|---|---|---|
| S1 Botanical only | 96 | +0.070 | ±0.201 |
| All S1 | 112 | −0.057 | ±0.186 |
Neither correlation is significant. Within the Botanical section (96 folios of continuous text), there is no drift toward Dialect B (r = +0.070, slightly in the wrong direction). Across all Scribe 1 folios, r = −0.057 (p = 0.55). No drift is detected at any scope.
---
## Robustness across transcriptions
All key results replicate across ZLZI, Takahashi (TTVE), and your own transcription (JSLI):
| Metric | ZLZI | TTVE | JSLI |
|---|---|---|---|
| Lines (≥2 tokens) | 4,366 | 4,393 | 1,250 |
| Tokens | 36,669 | 37,072 | 9,358 |
| Gallows ratio (real lines) | 3.4× | 3.7× | 2.7× |
| Gallows ratio (width 6 re-break) | 1.4× | 1.5× | 1.3× |
| -m final (real lines) | 14.6% | 16.1% | 13.9% |
| -m final (width 6 re-break) | 2.9% | 3.0% | 3.3% |
| AC within | 0.173 | 0.174 | 0.176 |
| Drift r (S1 Botanical folios) | −0.050 | −0.049 | −0.045 |
No conclusion depends on the choice of transcriber.
## Summary
| Question | Answer | Key numbers |
|---|---|---|
| Are lines functional units? | Yes | All four effects vanish under re-breaking |
| Is LAAFU shared across scribes? | Yes — all five scribes show it | Gallows ratio 2.4–4.4×; -m enrichment 9.7–22.6% |
| Are Dialects A and B separable? | Yes | Cross S1↔S2 JSD = 0.860; within-S2 = 0.686; S2↔S3 = 0.719 (B variants closer to each other) |
| Is A-to-B gradual drift? | No drift detected | r = +0.070 within S1 Botanical (96 fol); r = −0.057 across all S1 (112 fol); neither significant |
| How does Scribe 4 (Astrological) fit? | Near the midpoint | A-ness = 0.496; Currier said "mostly A" but vocabulary is intermediate |
Best regards,
Edward