Ok let me try to explain the score function.
step 1.
imagine there is a group of words that combined are 10% of the text, so 90% of the words in the text are in other groups. Lets call this group of words "lord"
Q: What is the chance that "the next word" is a member of the "lord" group?
A: 10% if you assume the chance is completely random.
Now imagine a second group of words, lets call this group "the" say size around 5% of total words of the text belong to this group "the"
Now when parsing the text whenever we encounter a word from the group "the" the chance that "the next word" is from the group "lord" is much higher than 10% that we answered in the question above. It's closer to 40%
I see that as a signal that is significantly above noise level.
Step 2:
There is always some variation, some noise. How far above noiselevel is that 40%?
This is where the You are not allowed to view links.
Register or
Login to view. comes to the rescue. If you take enough "samples" of any distribution, it will start to behave as a normal distribution (Gaussian)
So imagine the voynich, vulgata, king james was written by throwing some dice to determine what the next word will be.
The noise level (sigma, standard deviation) is proportional to the squareroot of the number of throws.
If you throw some dice ten thousand times, the expected number of sixes you throw is thousand six hunderd and sixty six. plus or minus on avarage thirthy seven.
The average noise is thirty seven. The sigma is thirty seven.
Step 3:
The software produces these giant tables with some positive values but very many negative values. These are the actual transition frequencies compared to the expected transition frequencies. when you read negative twenty in the table, that means that the actual transition rate was twenty sigma below the expected transition rate.
Step 4.
on average, when taking absolute values the noise should be one sigma. That is the definition of standard deviation.
The left number in the score tables is just the straightforward average of all these transitions divided by their individual sigma
To accentuate the extremes a little bit, i choose to also show the average of the squares of these numbers.
This is very similar to You are not allowed to view links.
Register or
Login to view.
Basically the score 280 should be squarerooted to get the RMS (around 16), but i was lazy and bigger numbers more beautiful
Becuase it is not really the RMS but RMS squared i just call it "the score"
The score of king james genesis with the words shuffled is around 20. without shuffling around 300.
I hope this clarifies the procedure.