EDIT: the results in this post don't seem to break down under random permutation of words in the MS, so likely they don't mean anything.
I'm not a big fan of the self-citation hypothesis, not because I think it's right or wrong, but because it's largely irrelevant to my line of investigation. Self citation is compatible with one-to-many ciphers and the extent of its use largely depends on how bored or in how much of a hurry the encoder is and tells little about the encoding itself. So, treat the below "proof" of self citation presence with a grain of salt, it's just a 20 minutes effort using OpenAI to create a Python script with not much independent verification on my part. It is highly likely that I'm just repeating a computation that has already been published before.
The basic idea: we get EVA transliteration and remove all spaces and ligature marks, rare character marks and paragraph marks, and collapse all alternatives to the first option. Then from the resulting text we take 10000 rarest shortest substrings between 4 and 12 characters long (not crossing new lines). For each of these substrings we will find in the text the nearest preceding and nearest following substring with the Levenstein distance <= 1 (so, it should be either the same exact substring or its variation with 1 character added, removed or replaced). We limit the shortest length of the substrings to 4 because for shorter strings there will be too many matching candidates (for example, substring AB of length 2 will match with any nearest A or B, and substring ABC of length 3 will match any AB, AB or
CB BC). Given that we ignore the spaces, the edit distance of gluing two sequences together or separating them is 0.
Then we'll plot three charts - the distribution of the nearest preceding distances, the distribution of the nearest following distances and the distribution of the closest distances (which is the min of the first two for each instance).
And we get the following picture:
[
attachment=10941]
What is remarkable here? For many rare substrings there can be no similar substring all the way forward to the end of the manuscript, or all the way back to the beginning of the manuscript. The tails on the preceding and following graphs are substantial.
However, there is almost always a similar substring within a couple of thousand characters if we look both directions. Given that it's likely the folios have been rearranged, I'd say most rare substrings have a doppelgänger within a couple of pages at most, and the majority on the same page.
This seems to show that there is a lot of short range pairwise similarity in the text, as if many rare sequences in the text are tied to only one other nearby locus.
(Edit: please see the run with randomly shuffled strings below.)
Python script: You are not allowed to view links.
Register or
Login to view.
CSV with results for individual substrings: You are not allowed to view links.
Register or
Login to view.