I decided to start another thread on this. While it ties directly into my work on the copy/mutate ledger (You are not allowed to view links.
Register or
Login to view.), this deserves to be tied into my previous work on the bigram <ed> and to have it's own post because of the.... "Huh?" factor. This may already be known but I'm just discovering it so pardon if stepping on someone else's work.
Here's my previous posts on that subject for reference.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
In those posts I try to describe the differences between Currier A, which is Scribe 1 and Currier B which is Scribe 2+. They are dominated by the difference between the bigrams <ho> and <ed> with I defined as ED0 and ED+ regimes. My work on the generator focuses on Scribe 1 because of this apparent "regime shift" in order to try to gain stability in generator production.
Today, I decided to explore that a little further to see what effects I would need to anticipate should I decide to model Scribe 2+ and here's the interesting results I got.
First, I had codex create a basic machine learning script that would scan through the pages created by specific scribes (excluding any pages/sheets classified as being from 2 different scribes) and I had it look for syllabic chunks, things that resembled syllables. After creating 5 data files, one for each scribe, I had codex create a second script that would combine and compare those results to see how different scribes used different chunks. Here's the result:
Both charts are sorted by Scribe 1 counts for comparison.
Total count top chunks used by each scribe:
Normalized count of top chunks used by each scribe:
Here's my interpretation of what I'm seeing:
To me, this looks cumulative and directional. The later scribes are not replacing the earlier ecology, they are amplifying parts of it and reducing usage of other parts.
And it's monotonic. Usage is consistently moving in one direction without reversing by scribe.
Taking this method of ML chunking one step further, I had codex examine these syllables by quire and sheet and create a PCA chart (You are not allowed to view links.
Register or
Login to view.). In this chart, a dot represents a sheet. The number above the dot represents the quire number.
Possible explanations for this because this does not look like random drift:
- Source-pool inheritance: Different scribes may have preferentially copied from different quire/sheet pools, which matches my proposed sheet-source pool idea for page creation. Furthermore, if pages were created using source sheets, the chunk ecology may suggest that later scribes preferentially reused sheets already created by earlier scribes and may have preferred the most recent scribe's work. (sorry, had to list this one first because of my working theory, not that it has any other specific weight.)
- Regime reweighting: The scribes may have used slightly different weighting systems for selecting or constructing words while still operating within the same broader production framework.
- Progressive stabilization: Over time, productive word families may have become increasingly dominant, with prefixes and endings becoming more specialized and certain chunks becoming structurally “sticky.”
- Scribal training lineage: Later scribes may have learned from already-shifted exeamples. Scribe 5 inherits an ecology already shaped by Scribe 4, Scribe 4 inherits one shaped by Scribe 3, and so on.
- Section/topic dependence: I have not specifically tested this yet, but different scribes are known to work in specific sections, so some chunk ecology differences could reflect section-specific production behavior rather than purely scribal differences.
And there are certainly other plausible explanations as well. What I'm seeing though iis that the chunk ecology appears structured and directional rather than random.