Thanks SherriMM,
I think you have raised something of genuine significance. But perhaps the way to analyse this further is not to look at 3 or higher character string repeats but to look at the distribution of 2-character repeats.
For instance, in the starting characters for You are not allowed to view links.
Register or
Login to view. ( using the GC transliteration ) a
q line is followed 7 times by one of the
ch variants, and there appears to be a pattern. This is curious.
[
attachment=11595]
In f83r
s lines seem to appear too often. Curious also.
[
attachment=11594]
Taking only paragraph text from the GC transliteration and not including lines spanning page breaks, the matrix of occurrences for the character pairs is given here.
( counts )
For instance, there are 80 lines starting
d that are followed by a line staring
y. But if you look at the matrix of affinities, the ratios of counts against expected values, then you get something rather unexpected.
d followed by
d occur 50 times, but this is 0.52 of what would be expected if the lines were randomly shuffled. But also look at
o-o ( 0.25 ),
q-q ( 0.3 ),
t-t ( 0.59 ). These doubles all occur less often than expected. Yet if you look at
s-s ( 1.42 ), it is high.
s seems to have a liking for itself more than for other characters.
( affinities )
Also look at some of the highs.
q-Sh and
q-ch occur 2.18 and 3.75 times what would be expected.
o-q occurs 134 times but is 2.01 times what would be expected. Also look at some of the lows.
s-ch hardly ever occurs. These are big swings from parity.
Applying statistics hypothesis testing methods to this data to obtain a confidence level in the hypothesis that the effects are not random would not be necessary. The swings from parity are just too big.
So, how to explain these anomalies.
If the manuscript were in some natural language then it would be expected that the narrative would just wrap to the next line if there was no more space for a word at the end of the previous line. Line starting characters would be independent of each other and there would be reasonable parity between observed and expected repeats.
Likewise with any cypher hypothesis that transforms words according to some algorithm.
So it just remains that these anomalies are showing evidence of human choice and selection, that the text of the VMS was artificially constructed and that the writers had some private method to generate text, and meaningless, had their favourite character strings, added sufficient variability in order to give the constructed text a semblance of genuineness, but with the exception of
s did not want to repeat line first characters too often.