29-12-2025, 08:24 PM
I haven't seen this analysis posted here before. It's been suggested that repeating word sequences (like sheol sheol sheol) are You are not allowed to view links. Register or Login to view. caused by scribal error. If this were the case, you would expect there to be differences amongst the different scribal hands, reflecting different levels of experience, scribal accuracy, etc. So to test this, I wrote some code to analyze the variations of word repetition among the 5 different known hands. In summary, these are the results (from the RF EVA transcription given You are not allowed to view links. Register or Login to view.):
I then did a Bayesian analysis, modeling the repetition probability of each hand as a Beta distribution and calculating the likelihood that the different hands actually represent different repetition rates. The hands with more words (1, 2, 3) lead to a narrower, 'tighter' distribution because we have more data. The low-resource hands (4 & 5) have broader distributions (the repetition rates are consistent with a larger range of underlying repetition probabilities). For 3-word repetitions, there isn't enough data to draw meaningful conclusions except to say that there's nothing to indicate any statistically significant differences. For 2-word repetitions, here are the results.
[attachment=13211]
In summary: Mostly, hands have repetition rates that are consistent with each other. The largest statistically meaningful difference seems to be between hand 3 and hand 1. The Hand 1 mean rate is 0.753%. The hand 3 mean rate is 0.546%. The probability that Hand 1 > Hand 3 is 96.6% according to this model. Interestingly, hand 4 seems consistent with all the other hands (except 1) despite the fact that hand 4 seems to mostly write in "labelese" and not "prose", indicating that the repetitions may actually be a feature of the language rather than a mistake.
I also ran this analysis on the version of the EVA transcription by Lisa Fagin Davis where some of the gallows characters are taken to be substitutions of each other. The results are broadly similar, and I get around 84% likelihood of hand 1 > hand 3
There are some limitations here due to limitations of my script, for example I haven't counted word repetitions that cross line boundaries. But this seems unlikely to change the result.
What are my conclusions? Well it's no surprise when it comes to the VMS but it's hard to conclude anything. The consistency of repetitions amongst the hands seems to indicate that the repetitions are a language feature, not mistakes. The higher rate of repetitions for hand 1 vs hand 3 could either be due to mistakes or differences in Currier A (which is predominantly what hand 1 writes in).
Code:
Hand | Total Words | 2-Word Repetitions | 3-Word Repetitions
------------------------------------------------------------------------
1 | 8,626 | 64 | 1
2 | 8,947 | 60 | 2
3 | 11,547 | 62 | 1
4 | 654 | 2 | 0
5 | 866 | 1 | 0I then did a Bayesian analysis, modeling the repetition probability of each hand as a Beta distribution and calculating the likelihood that the different hands actually represent different repetition rates. The hands with more words (1, 2, 3) lead to a narrower, 'tighter' distribution because we have more data. The low-resource hands (4 & 5) have broader distributions (the repetition rates are consistent with a larger range of underlying repetition probabilities). For 3-word repetitions, there isn't enough data to draw meaningful conclusions except to say that there's nothing to indicate any statistically significant differences. For 2-word repetitions, here are the results.
[attachment=13211]
In summary: Mostly, hands have repetition rates that are consistent with each other. The largest statistically meaningful difference seems to be between hand 3 and hand 1. The Hand 1 mean rate is 0.753%. The hand 3 mean rate is 0.546%. The probability that Hand 1 > Hand 3 is 96.6% according to this model. Interestingly, hand 4 seems consistent with all the other hands (except 1) despite the fact that hand 4 seems to mostly write in "labelese" and not "prose", indicating that the repetitions may actually be a feature of the language rather than a mistake.
I also ran this analysis on the version of the EVA transcription by Lisa Fagin Davis where some of the gallows characters are taken to be substitutions of each other. The results are broadly similar, and I get around 84% likelihood of hand 1 > hand 3
There are some limitations here due to limitations of my script, for example I haven't counted word repetitions that cross line boundaries. But this seems unlikely to change the result.
What are my conclusions? Well it's no surprise when it comes to the VMS but it's hard to conclude anything. The consistency of repetitions amongst the hands seems to indicate that the repetitions are a language feature, not mistakes. The higher rate of repetitions for hand 1 vs hand 3 could either be due to mistakes or differences in Currier A (which is predominantly what hand 1 writes in).