25-04-2017, 02:07 PM
(25-04-2017, 12:34 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.These look like coincidences to me. You have about 240 pages of text. Suppose for the sake of argument the words are random. Then you'll find places where you get the same word repeated several times, or in the same line position, or similar words close together. But (since it's random) claiming significance in this is like seeing faces in clouds, or hearing voices in static. People are predisposed to spotting patterns.
There are not some places. It happens everywhere.
There are two different patterns. One pattern is the vertical pattern described by Schinner. In his paper Schinner has presented statistical evidence. I have calculated the proportion of identical words appearing near to each other for the whole VMS (see You are not allowed to view links. Register or Login to view.). You have already implemented such a feature for your sample text. Nearly all of the lines in your sample text start with [p].
The second pattern is that similar words do co-occur throughout the text. This pattern is described by You are not allowed to view links. Register or Login to view.. I have also calculated the frequencies for repeated words within 20 lines for the whole VMS (see You are not allowed to view links. Register or Login to view.).
In some way also the difference between Currier A and Currier B is based on this pattern (see You are not allowed to view links. Register or Login to view.). For instance words using [ed] are very rare in Currier A but common in Currier B. See for instance the frequencies for the words [cheody], [chedy] and [qokeedy] in the VMS (see You are not allowed to view links. Register or Login to view.):
Herbal in Currier A [cheody] x 8 [chedy] x 1 [qokeedy] x 0 (word count: 8087)
Pharmaceutical (A) [cheody] x 18 [chedy] x 1 [qokeedy] x 0 (count: 2529)
Astronomical [cheody] x 8 [chedy] x 4 [qokeedy] x 0 (count: 2136)
Cosmological [cheody] x 7 [chedy] x 24 [qokeedy] x 4 (count: 2691)
Herbal in Currier B [cheody] x 7 [chedy] x 62 [qokeedy] x 9 (count: 3233)
Stars (B) [cheody] x 33 [chedy] x 190 [qokeedy] x 137 (count: 10673)
Biological (B) [cheody] x 0 [chedy] x 210 [qokeedy] x 153 (count: 6911)
On the other side words like [chol] are common in Currier A but rare in Currier B:
Herbal in Currier A [chol] x 228 [chor] x 155 (count: 8087)
Pharmaceutical (A) [chol] x 45 [chor] x 24 (count: 2529)
Astronomical [chol] x 8 [chor] x 2 (count: 2136)
Cosmological [chol] x 19 [chor] x 8 (count: 2691)
Herbal in Currier B [chol] x 13 [chor] x 6 (count: 3233)
Stars (B) [chol] x 62 [chor] x 19 (count: 10673)
Biological (B) [chol] x 14 [chor] x 1 (count: 6911)
This is for sure not a coincidence. That rare glyph sequences occur together is only one aspect of this pattern.