The Voynich Ninja
vord-final [-ckhy]: curious statistics - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: vord-final [-ckhy]: curious statistics (/thread-3365.html)



vord-final [-ckhy]: curious statistics - geoffreycaveney - 22-09-2020

The Voynich vord ending [-dy] is well known for its ubiquitous presence in the ms text. But recently I have found some rather striking statistics about another interesting vord ending:
[-ckhy]. 

Of course [-ckhy] cannot be anywhere near as frequent as [-dy], because the ligature [ckh] itself only occurs with modest frequency in the ms text. But what is striking to me is just how large a portion of all occurrences of [ckh] in fact appear as part of this particular vord ending [-ckhy].

I have followed here Koen's and Marco's verbose cipher analysis ideas, by which [ok] is treated as a single unit distinct from [k]. Thus I treat [ockh] as a distinct unit different from [ckh]. My analysis here focuses on [ckh] and [-ckhy] without [o] before them. 

It is probably well known that vord-final [-dy], occurring 6690 times, constitutes over half of all 12505 occurrences of [d] in the Voynich ms text.

I find it notable that vord-final [-ckhy] (without preceding [o]), occurring 360 times, also constitutes over half of all 706 occurrences of [ckh] (without preceding [o]) in the ms text as well.

No other glyphs share this level of frequency of occurrence as vord-final [glyph+y]. Vord-final [-ockhy] makes up 42% of all [ockh]. Curiously, the preference is reversed for [cth], where final [-octhy] makes up 43% of all [octh], but final [-cthy] without preceding [o] only makes up 37.5% of all [cth] without preceding [o]. There would be no reason to expect such discrepancies between [ckh], [ockh], [cth], and [octh] if the ms text were merely generated by some kind of automatic self-copying algorithmic method. 

It is interesting to note that the special top-line ligature glyphs have a somewhat less frequent occurrence before final [y]: vord-final [-cphy] only makes up 23% of all [cph], and final [-cfhy] only 31% of all [cfh]. Like [cth], but unlike [ckh], these percentages rise somewhat with a preceding [o]: final [-ocphy] makes up 39% of all [ocph], and final [-ocfhy] 40% of all [ocfh]. It should be noted that the sample sizes are getting extremely small for these last glyph sequences: Here 40% means just 6 out of 15. 

Thus we have the curious fact that [ckh] is the only one of the ligature glyphs which is more likely to occur before vord-final [y] without a preceding [o] than it is with a preceding [o]. 

It is probably not surprising that vord-final [-ey] is also rather frequent, making up 27.6% of all [e]. All other glyphs occur much less frequently before vord-final [y]: final [-oshy] is 19% of all [osh], and final [-ochy] is 15% of all [och], but these percentages decline significantly to 6% and 9% without the [o]. All other glyphs, with or without preceding [o], occur less than 10% of the time before vord-final [y]. 

===============

Therefore I was curious to look into this very frequent vord-final [-ckhy] (without preceding [o]) more deeply. In fact, it turns out that the vast majority of occurrences of vord-final [-ckhy] without preceding [o] comprise just five Voynich ms vords: [chckhy] (140), [shckhy] (60), [checkhy] (47), [ckhy] (39), and [sheckhy] (35). Together they make up 321 of the 360 vord-final [-ckhy] without [o]. No other such Voynich vord type occurs more than 3 times in the ms text. 

We thus arrive at the striking conclusion that these five Voynich vords [chckhy], [shckhy], [checkhy], [ckhy], and [sheckhy] constitute a sizable 45.5% of all [ckh] without preceding [o] in the ms text. 

Even the famous [chedy], [shedy], [qokeedy], et al., cannot match this feat: The five most frequent vord-final [-dy] vords [chedy], [shedy], [qokeedy], [qokedy], and [dy] constitute only 17.4% of all occurrences of [d] without preceding [o] in the ms text. 

For comparison I also checked these five specific [-ckhy] vords with [cth] substituted for [ckh]: I found 79 [chcthy], 31 [shcthy], 28 [checthy], 111 [cthy], and 20 [shecthy]. Together they constitute 34% of all [cth] without preceding [o]. Most striking is the much greater frequency of [cthy] itself as a vord, compared to [ckhy] itself. All the other vords occur more frequently with [ckh] than with [cth]. Again, these are unexpected discrepancies that cannot be easily explained by an automatic self-copying algorithmic method. 

Finally, I checked these five [-ckhy] words with the preceding [o] inserted: I found 21 [chockhy], 5 [shockhy], 10 [cheockhy], 13 [ockhy], and 2 [sheockhy]. Together they constitute 25.4% of all [ockh]. 

In conclusion, I think it may be worth looking more deeply into the five Voynich vords [chckhy], [shckhy], [checkhy], [ckhy], and [sheckhy] that make up a striking 45.5% of all occurrences of the ligature [ckh] without a preceding [o] in the ms text. Also, the one vord [cthy] that occurs more frequently than [ckhy] may be worthy of particular attention and investigation as well. 

Geoffrey


RE: vord-final [-ckhy]: curious statistics - Emma May Smith - 22-09-2020

I wouldn't regard [ckhy] as a word ending like [dy]. The ending [dy] can attach to word containing a bench gallows and they don't fit into the same position within words.

I believe the situation which you're calling attention to should be seen in the opposite way round: [ckh] doesn't occur as often at the start of words as might be expected. See the stats below:

[ckhy] 39
[ckh*] 196
[ockhy] 13
[chockhy] 21
[cheockhy] 10
[chckhy] 140
[checkhy] 47

[cthy] 111
[cth*] 498
[octhy] 10
[chocthy] 18
[cheocthy] 5
[chcthy] 79
[checthy] 28

All the situations where [ckh] is at the word start it is less than half as common than [cth]. While 50% of [cth] is at the word start, only 20% of [cth] is. Yet where it is internal [ckh] is at least as common as [cth] or even more so in pure number of tokens.

We see a similar but much weaker pattern with [k] and [t], and there are stronger patterns between these two, You are not allowed to view links. Register or Login to view.. I think the answer is that the relationship between [k, ckh] and [t, cth] is more than simply being in the same glass of glyphs. They have related distributions which may be reflections of the underlying text.


RE: vord-final [-ckhy]: curious statistics - geoffreycaveney - 23-09-2020

Thank you Emma, very interesting. Yes, the divergence between the frequency of vord-initial [cth] and relative infrequency of vord-initial [ckh] is striking indeed. 

I'm glad you also linked to your blog post about the distribution of [k] vs. [t]. I am still digesting all the details of the arguments in this post, but as a basic summary I think I can state that the main point is to try to explain the much greater frequency of [k] than [t], which is particularly pronounced in Currier B. Out of the approximately 4000 more occurrences of [k] than [t], about 1000 of them are due to the much greater frequency of [lk] than of [lt]. About another 1000 of them are due to the strikingly greater frequency of [qoke] than of [qote]. On the other hand, at the start of a line that is not the first line of a paragraph, [t] is much more frequent than [k], although this only covers about 250 occurrences in total. Emma discussed many more details and ideas about possible explanations in her blog post, but those are the main points that I took away from a first reading of it. If I understand the final part of Emma's blog post correctly, she may be suggesting that in some of these environments [k] and [t] may be in a certain kind of what linguists would call a complementary distribution, where one occurs in certain environments and the other occurs in other environments. However, this is at most a partial tendency, and cannot be applied to all occurrences of [k] and [t]. 

Concerning [qoke] vs. [qote], I would return to Koen's and Marco's verbose cipher analysis again: According to Koen's best version, [qok] and [qot] may each be single units that may be entirely separate and distinct from [k], [t], [ok], and [ot]! In fact, just checking the statistics again now, I observe that in general [qok] occurs over 3100 times, while [qot] only occurs 1130 times! This covers yet another about 1000 "extra" occurrences of [k] compared with [t] (beyond the about 1000 already covered by [qoke] vs. [qote]). Perhaps [qok] is simply a letter/sound/phoneme that is much more frequent than the letter/sound/phoneme represented by [qot]? Just as one hypothetical example, this would make sense if [qok] represented "s" and [qot] represented "z". 

[ok] and [ot] without preceding [q] do not have such a great frequency difference, only 2981 vs. 2736 by a simple count from voynichese.com. But then [k] and [t] without preceding [o] have a great frequency difference again, about 1750 more [k]. [lk] vs. [lt] explains about 1000 of these, but there remain about 750 to account for. 

Emma, I think you may be interested to read and study You are not allowed to view links. Register or Login to view. of the relative frequency of different consonants in a wide variety of global languages in initial vs. non-initial position. While the title refers to Egyptian, this seems misleading as the article is a general study of universal cross-linguistic tendencies. The section titles include "the most frequent consonant", "labials frequent initially", "sonorants rare initially", "r more frequent than l", "stops compared by manner of articulation", "voiced and voiceless stops", "stops compared by place of articulation", "affricates", "voiceless stops in initial vs. non-initial position", "voiced stops in initial vs. non-initial position", "s and z", "voiced fricatives", "š and s", and "č and š". Linguistically-minded analysts of the Voynich ms such as yourself, Emma, may find much of interest in this general cross-linguistic study of consonant types and positional occurrences, I think. 

Geoffrey