The Voynich Ninja
Word repetition analysis of the different hands - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Word repetition analysis of the different hands (/thread-5188.html)



Word repetition analysis of the different hands - anejati - 29-12-2025

I haven't seen this analysis posted here before. It's been suggested that repeating word sequences (like sheol sheol sheol) are You are not allowed to view links. Register or Login to view. caused by scribal error. If this were the case, you would expect there to be differences amongst the different scribal hands, reflecting different levels of experience, scribal accuracy, etc. So to test this, I wrote some code to analyze the variations of word repetition among the 5 different known hands. In summary, these are the results (from the RF EVA transcription given You are not allowed to view links. Register or Login to view.):

Code:
Hand      | Total Words  |  2-Word Repetitions  | 3-Word Repetitions
------------------------------------------------------------------------
1          | 8,626        | 64                  | 1               
2          | 8,947        | 60                  | 2               
3          | 11,547       | 62                  | 1               
4          | 654          | 2                   | 0               
5          | 866          | 1                   | 0




I then did a Bayesian analysis, modeling the repetition probability of each hand as a Beta distribution and calculating the likelihood that the different hands actually represent different repetition rates. The hands with more words (1, 2, 3) lead to a narrower, 'tighter' distribution because we have more data. The low-resource hands (4 & 5) have broader distributions (the repetition rates are consistent with a larger range of underlying repetition probabilities). For 3-word repetitions, there isn't enough data to draw meaningful conclusions except to say that there's nothing to indicate any statistically significant differences. For 2-word repetitions, here are the results.

   


In summary: Mostly, hands have repetition rates that are consistent with each other. The largest statistically meaningful difference seems to be between hand 3 and hand 1. The Hand 1 mean rate is 0.753%. The hand 3 mean rate is 0.546%. The probability that Hand 1 > Hand 3 is 96.6% according to this model. Interestingly, hand 4 seems consistent with all the other hands (except 1) despite the fact that hand 4 seems to mostly write in "labelese" and not "prose", indicating that the repetitions may actually be a feature of the language rather than a mistake.

I also ran this analysis on the version of the EVA transcription by Lisa Fagin Davis where some of the gallows characters are taken to be substitutions of each other. The results are broadly similar, and I get around 84% likelihood of hand 1 > hand 3

There are some limitations here due to limitations of my script, for example I haven't counted word repetitions that cross line boundaries. But this seems unlikely to change the result.

What are my conclusions? Well it's no surprise when it comes to the VMS but it's hard to conclude anything. The consistency of repetitions amongst the hands seems to indicate that the repetitions are a language feature, not mistakes. The higher rate of repetitions for hand 1 vs hand 3 could either be due to mistakes or differences in Currier A (which is predominantly what hand 1 writes in).


RE: Word repetition analysis of the different hands - Jorge_Stolfi - 29-12-2025

(29-12-2025, 08:24 PM)anejati Wrote: You are not allowed to view links. Register or Login to view.So to test this, I wrote some code to analyze the variations of word repetition among the 5 different known hands. In summary, these are the results (from the RF EVA transcription given You are not allowed to view links. Register or Login to view.):

Code:
Hand      | Total Words  |  2-Word Repetitions  | 3-Word Repetitions
------------------------------------------------------------------------
1          | 8,626        | 64                  | 1               
2          | 8,947        | 60                  | 2               
3          | 11,547       | 62                  | 1               
4          | 654          | 2                   | 0               
5          | 866          | 1                   | 0

Thanks for the intriguing and useful results!  

However, different "hands" may mean different topics; and it seems that word statistics are largely specific to topic, as one would expect.  Even the difference between Herbal-A and Herbal-B may be a consequence of those texts having been taken from two different sources. 

Can you see a difference between two hands on the same section  (Herbal_A, Herbal-B, Bio, etc..)?

All the best, --stolfi


RE: Word repetition analysis of the different hands - nablator - 29-12-2025

(29-12-2025, 08:24 PM)anejati Wrote: You are not allowed to view links. Register or Login to view.So to test this, I wrote some code to analyze the variations of word repetition among the 5 different known hands. In summary, these are the results (from the RF EVA transcription given You are not allowed to view links. Register or Login to view.):

Code:
Hand      | Total Words  |  2-Word Repetitions  | 3-Word Repetitions
------------------------------------------------------------------------
1          | 8,626        | 64                  | 1               
2          | 8,947        | 60                  | 2               
3          | 11,547       | 62                  | 1               
4          | 654          | 2                   | 0               
5          | 866          | 1                   | 0

Quick check on reduplications: 267 lines total

You should have a lot more in the "2-Word Repetitions" column.

Quick check on triple words:

For LFD hand 1 I have 2: You are not allowed to view links. Register or Login to view. and f47r: chol chol chol

For LFD hand 2 I have 4:
f40r: okaiin okaiin okaiin
f75r: qokedy qokedy qokedy qokedy
f79v: qokedy qokedy qokedy
f86v3: ytaiin ytaiin ytaiin

For LFD hand 3 I have 2:
f108v: qokeedy qokeedy qokeedy
f104v: sheol sheol sheol

For LFD hand 4 I have 1:
fRos: otody otody otody


RE: Word repetition analysis of the different hands - anejati - 29-12-2025

I primarily wanted to focus on the question of scribal accuracy. I suspect that analysis of word repetitions based on Currier languages has probably been looked at quite a lot. But it's a worthwhile question to ask if Currier language differences explain most of the discrepancy between hands 1 and 3. The results are interesting.

Language A Mean Repetition Rate: 0.7273%
Language B Mean Repetition Rate: 0.5787%
Confidence that A > B:          93.0%

So based on this it actually seems the confidence level of word repetitions of hand 1 vs hand 3 being different (96.6%) is more significant than word repetitions of lang A vs lang B being different, but not by a lot.


I think the main issue preventing any solid conclusions is that there's a lot of correlation between hands and langs, so separating out the differences is difficult. The most statistically significant differences seem to be between hand 1 and hand 3, which are also the hands that use the most different langs.


RE: Word repetition analysis of the different hands - anejati - 29-12-2025

(29-12-2025, 10:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Quick check on reduplications: 267 lines total

You should have a lot more in the "2-Word Repetitions" column.

Quick check on triple words:

For LFD hand 1 I have 2: You are not allowed to view links. Register or Login to view. and f47r: chol chol chol

For LFD hand 2 I have 4:
f40r: okaiin okaiin okaiin
f75r: qokedy qokedy qokedy qokedy
f79v: qokedy qokedy qokedy
f86v3: ytaiin ytaiin ytaiin

For LFD hand 3 I have 2:
f108v: qokeedy qokeedy qokeedy
f104v: sheol sheol sheol

For LFD hand 4 I have 1:
fRos: otody otody otody

What EVA transcription file are you using?


RE: Word repetition analysis of the different hands - nablator - 29-12-2025

(29-12-2025, 10:32 PM)anejati Wrote: You are not allowed to view links. Register or Login to view.What EVA transcription file are you using?

RF1a-n or RF1b-er.

egrep "\b(\w+)\.\1\b" RF1b-er.txt | wc -l
267

egrep "\b(\w+)\.\1\.\1\b" RF1b-er.txt | wc -l
10
The one on f81r.5 may or may not be counted depending on how you interpret ",".


RE: Word repetition analysis of the different hands - anejati - 29-12-2025

Code:
Hand      | Total Words  |  2-Word Repetitions  | 3-Word Repetitions 
------------------------------------------------------------------------
1          | 10,633      | 82                  | 2                 
2          | 10,934      | 71                  | 4                 
3          | 11,921      | 64                  | 1                 
4          | 2,938       | 8                   | 0                 
5          | 902         | 1                   | 0

And:
Comparison: Hand 1 vs Hand 3
Hand 1 Mean Rate: 0.780%
Hand 3 Mean Rate: 0.545%
Probability that Hand 1 > Hand 3: 98.6%


RE: Word repetition analysis of the different hands - dashstofsk - 30-12-2025

This can be explained partly by the fact that hand 1 words are shorter. Look at the length distributions below. ( Paragraph data from GC transliteration with characters 101-C converted to ee . ) Also hand 3 words are mostly from quire 20. Hand 2 words are mostly from quire 13. And it is known that quire 13 words have less variability than in the rest of the manuscript. There is a high frequency of just a few words. 9 words make up 20% of the total with qok words dominating. In quire 20 20% of the text is made up of 16 words. ( You are not allowed to view links. Register or Login to view. )

All this will add bias to the frequencies of repeats.

Hand 1:
(  1    586    5.4 % ) 
(  2  1468  13.6 % ) 
(  3  3318  30.7 % ) 
(  4  2309  21.4 % ) 
(  5  1804  16.7 % ) 
(  6    918    8.5 % ) 
(  7    300    2.8 % ) 
(  8    72    0.7 % ) 
(  9    14    0.1 % ) 
(  10      5    0.0 % ) 

( avg. = 3.71 )


Hand 2:
(  1    374    3.3 % ) 
(  2  1516  13.5 % ) 
(  3  2341  20.8 % ) 
(  4  2772  24.6 % ) 
(  5  2383  21.2 % ) 
(  6  1234  11.0 % ) 
(  7    542    4.8 % ) 
(  8    77    0.7 % ) 
(  9    18    0.2 % ) 
(  10      7    0.1 % ) 
(  11      1    0.0 % ) 

( avg. = 4.04 )

Hand 3:
(  1    631    4.8 % ) 
(  2  1922  14.5 % ) 
(  3  2414  18.3 % ) 
(  4  3034  23.0 % ) 
(  5  2740  20.7 % ) 
(  6  1606  12.2 % ) 
(  7    689    5.2 % ) 
(  8    139    1.1 % ) 
(  9    31    0.2 % ) 
(  10      8    0.1 % ) 

( avg. = 4.05 )