Uncertain spaces as evidence of verbose glyph pairs
kckluge > Yesterday, 05:24 AM
Anyone who's been around the Ninja (or Voynich Mss. related discussions in general) is familiar with the existence of certain glyph pairs with unusually high frequencies which are significant contributors to the low conditional second-order entropy of the text. To pick an obvious example, here are the 10 most frequent glyphs following Currier 'O'/EVA 'o' in running text lines in ZL_ivtff_1b.txt converted to Currier:
kgram: OF OE OP OR O8 OB O2 OC OA OX
(EVA): ok ol ot or od op os oe oa ockh
Rank: 1 2 3 4 5 6 7 8 9 10
Count: 5620 5242 3244 2531 2035 519 382 322 257 179
REFreq: 0.2609 0.2434 0.1506 0.1175 0.0945 0.0241 0.0177 0.0150 0.0119 0.0083
RECmFrq: 1.0000 0.7391 0.4957 0.3451 0.2276 0.1331 0.1090 0.0912 0.0763 0.0644
Note the steep drop from OR (11.75%) and O8 (9.45%) to OB (2.41%) and O2 (1.77%).
If certain glyph pairs go together as a unit, uncertain spaces before and/or after may reflect an unconscious hesitation on the part of the scribe. Here are the 20 most frequent glyph pairs with an uncertain space after them:
kgram: OE, AR, OR, AE, CO, 89, C9, SO, AM, AT,
(EVA): ol, ar, or, al, eo, dy, ey, cho, aiin, air,
Rank: 1 2 3 4 5 6 7 8 9 10
Count: 387 207 181 165 121 101 96 58 55 51
REFreq: 0.2043 0.1093 0.0956 0.0871 0.0639 0.0533 0.0507 0.0306 0.0290 0.0269
RECmFrq: 1.0000 0.7957 0.6864 0.5908 0.5037 0.4398 0.3865 0.3358 0.3052 0.2761
kgram: S9, AN, 4O, P9, O2, F9, C8, ZO, OP, C2,
(EVA): chy, ain, qo, ty, os, ky, ed, sho, ot, es,
Rank: 11 12 13 14 15 16 17 18 19 20
Count: 34 25 24 20 20 19 19 18 14 14
REFreq: 0.0180 0.0132 0.0127 0.0106 0.0106 0.0100 0.0100 0.0095 0.0074 0.0074
RECmFrq: 0.2492 0.2313 0.2181 0.2054 0.1948 0.1843 0.1742 0.1642 0.1547 0.1473
...and here are the 20 most frequent glyph pairs with an uncertain space before them:
kgram: ,SC ,AM ,ZC ,FC ,8A ,AE ,FA ,OE ,AR ,89
Rank: 1 2 3 4 5 6 7 8 9 10
Count: 195 162 149 126 124 101 97 86 82 70
AllFreq: 0.0011 0.0009 0.0008 0.0007 0.0007 0.0006 0.0005 0.0005 0.0005 0.0004
REFreq: 0.0910 0.0756 0.0695 0.0588 0.0579 0.0471 0.0453 0.0401 0.0383 0.0327
RECmFrq: 1.0000 0.9090 0.8334 0.7639 0.7051 0.6472 0.6001 0.5548 0.5147 0.4764
kgram: ,SO ,4O ,FS ,OR ,AN ,PA ,AT ,PC ,EF ,AJ
Rank: 11 12 13 14 15 16 17 18 19 20
Count: 55 54 45 42 39 29 28 27 26 26
AllFreq: 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001
REFreq: 0.0257 0.0252 0.0210 0.0196 0.0182 0.0135 0.0131 0.0126 0.0121 0.0121
RECmFrq: 0.4438 0.4181 0.3929 0.3719 0.3523 0.3341 0.3206 0.3075 0.2949 0.2828
None of those counts are huge given the total number of glyph pairs in the running text, but there is at least a weak signal with regard to some of the most obvious candidates like OE, OR, AE, AR, and the various word-end specific A<x> combos like AM, AN, AT, AJ.
Of course, the above results need to be taken with an appropriate grain of salt given disagreements between transcribers regarding whether something is a clear or uncertain space, or whether there is an uncertain space in a given position at all. Nevertheless, thought it was worth throwing out there as something to think about.
Happy New Year to all readers & posters on the Ninja, and best wishes for a happy & healthy 2026.