The problem with the auto-copy hypothesis is that, in the end, it does not *really* work.
This is difficult to test, because it is not formulated that precisely. There is the vague notion of 'small changes'. As soon as one makes a more precise definition of what are small changes, it becomes testable.
What one can do is take one page in the MS (and one could repeat it for all pages) and check what is the distribution of Levenshtein differences to try to come up with a definition that could work.
The hypothesis says that every word is created by taking and earlier word, making a small change to it (or no change) and then write that word down.
This is not tested in any of the papers I have seen. Instead there are many statistics. Interesting statistics, no doubt about that. But these statistics do not allow to distinguish between the two options:
- the author deliberately generated words according to the above principle
- the appearance of similar words is a side effect of something else.
So let's do the test. I have taken folio f58r. This has a lot of text in three paragraphs.
For every word, one can make the comparison with all previous words, and find the one with the smallest distance.
Using the most recent ZL transliteration (version 1r, You are not allowed to view links.
Register or
Login to view. ), and ignoring uncertain spaces (*), this has 344 words. Of these, exactly 25% have a minimum L distances of three or more.
If one does not count the words in the first lines of paragraphs (as per Torsten's latest algorithm), there remain 321 words of which 73 have a minimum L distance of three or more. That is 22.7%
Now one may argue whether an L distance of 3 is 'small' or not. I would argue that it is not. We also have to keep in mind that all changes have to follow complicated rules, and the average word token length is around 5. Some examples of L distance 3 changes on f58r:
olchokal from
olchear
dShor from
Sholy
ytalody from
otaly
airaldy from
arary
olaraly from
otaly
There are also numerous words, even further down the page, that do not look like any previous word.
There are 22 words with an L distance of 4 or more.
Line 27 has
Sheetchy
Line 33 has
ShocTHhy
Line 37 has
chkaiinolfcham (L distance 8)
Now it has been argued that the L distance should be computed over glyphs, not over Eva, but in almost all papers by Torsten it has been computed using Eva, and this actually makes no difference in the majority of cases.
ch and sh differ only by one (
ch and
Sh)
s and sh differ only by one (
s and
Sh)
Note (*): the choice to ignore uncertain spaces is unfavourable to Torsten's hypothesis. Treating these also as word spaces will have a minor impact on the result, but it simply turned out that this was the case for the file I used.
Having looked at this in some detail, the impact is only small, there will still be a large fraction of words with a significant L difference.