One of the questions from my talk on Voynich Manuscript Day was about the differences between words ending [ey] and [dy], and the word immediately following them. I thought I would provide some more statistics and thoughts around this, and also ask for the opinions of others. (There were other questions too from my talk, but I'm so sorry, I didn't have chance to record them. I'm happy to look into anything people want me to follow up.)
All the scores below are for a given feature (glyph or biglyph) occurring in the word immediately following a word ending [ey] or [edy]. They are expressed as z-scores, but I'll set the "interesting" threshold to +/- 2, which has a 95% probability. So most are likely to be valid/worth considering seriously. The data is all Language B, with feature 1 needing to occur at least 100 hundred times in the text, and feature 2 needing to occur at least 200 times in the distribution +/- 8 around feature 1. (These are the same for the data I presented.)
(I've changed [dy] to [edy] to more clearly identify [dy] as a potential suffix. I believe (though it's not important to these patterns) that [ey] + [dy] = [edy]. Hence we're actually looking at the difference caused by the addition of this suffix. Of course, what "suffix" even means here is anybody's guess.)
The scores all show an instance where one pattern is at > +/- 2 and the other is different in some way. I'll try to provide some thoughts, but would welcome others.
End--Start
('ey.', '.d') 0.4
('ey.', '.da') 1.6
('edy.', '.d') -2.9
('edy.', '.da') -2.3
Words ending [edy] are less likely to be following by words starting [d]. Note that if [a] = [y] then part of this might be avoiding [dy.dy] across a space. (Though see below about words ending [dy].)
('ey.', '.k') 2.6
('ey.', '.ka') 3.2
('ey.', '.ke') 0.7
('ey.', '.te') 2.9
('edy.', '.k') -2.6
('edy.', '.ka') -2.8
('edy.', '.ke') -2.4
('edy.', '.te') 0.3
The contrast before words beginning [k] was mentioned in the talk. It's interesting that [te] has the same kind of bias, but not so extreme.
('ey.', '.l') 3.3
('ey.', '.lk') 3.2
('ey.', '.lo') 2.3
('ey.', '.lsh') 2.5
('edy.', '.l') 1.9
('edy.', '.lk') 0.9
('edy.', '.lo') 0.0
('edy.', '.lsh') 0.8
Words beginning [l] are more common after [ey], though after [edy] they're not less common than average, just not so strongly positive.
('ey.', '.ot') -2.1
('edy.', '.ot') 1.4
The pattern is reversed before words beginning [ot]. (Though other words beginning [o] show no difference, so maybe this is just noise?)
('ey.', '.r') 3.0
('ey.', '.ra') 3.2
('edy.', '.r') 1.4
('edy.', '.ra') 0.4
Much stronger preference for [ey] before [r] than for [edy]. But not sure what to make of this.
End--End
('ey.', 'edy.') -2.1
('ey.', 'ey.') 2.2
('edy.', 'edy.') 2.1
('edy.', 'ey.') 0.0
We can see the contrast between [ey] and [edy] before others words ending [edy]. It may be that words ending [edy] cluster, but I'm not sure that would account for after [ey]. Likewise, there's a similar (but less strong) contrast for words ending [ey].
('ey.', 'in.') 2.8
('ey.', 'ain.') 2.5
('ey.', 'iin.') 2.3
('edy.', 'in.') -0.7
('edy.', 'ain.') 0.5
('edy.', 'iin.') -1.1
All kinds of words ending with combination of [i] and [n] prefer after [ey] than [edy].
('ey.', 'o.') -0.2
('edy.', 'o.') -2.3
A smaller contrast with words ending [o].
Thoughts?
In my opinion, the statistics demonstrate that [*ey] and [*edy], although related, behave differently. One finding is that the word following an [ey]-final is more likely to start with a gallow than a word following an [edy]-final. It might be interesting to check if the [edy]-final is more strongly associated with other word starts than the [ey]-final. To me it looks like [qo*] is more likely after [edy] than after [ey]. Another idea to explore is whether there is a minimal distance between an occurrence of a [d]-glyph and the next occurrence of a gallow glyph. At least within words the [d]-glyph and gallow glyphs very rarely follow each other. The same might be true for word breaks.
A possible explanation for the end-to-end pattern for [*ey] and [*edy] is that the occurrence of a word final increases the likelihood of it reoccurring as the final of the next word. It seems as also other word finals, like [*d], [*l], [*r], and [*s] have an increased likelihood of reoccurring. It might be interesting to check if beside the tendency to avoid [*n.*n] other exceptions for this pattern exists. Side note: Similar start-to-start patterns might also exist. There are for instances sequences like "ckhy.ckho.ckhy" on You are not allowed to view links.
Register or
Login to view. or "lkaiin.lkchey.lkain.lror" on f115v.
Another interesting question for future research might be to check if differences in these patterns exist between Currier A and B, or even between different Currier B sections. One impression I have is that the patterns observable in Currier A often appear more pronounced in Currier B.
(07-08-2024, 09:20 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In my opinion, the statistics demonstrate that [*ey] and [*edy], although related, behave differently. One finding is that the word following an [ey]-final is more likely to start with a gallow than a word following an [edy]-final. It might be interesting to check if the [edy]-final is more strongly associated with other word starts than the [ey]-final. To me it looks like [qo*] is more likely after [edy] than after [ey]. Another idea to explore is whether there is a minimal distance between an occurrence of a [d]-glyph and the next occurrence of a gallow glyph. At least within words the [d]-glyph and gallow glyphs very rarely follow each other. The same might be true for word breaks.
On words beginning [qo], it seems like both [ey] and [edy] are followed by [qo] with the same preference. There doesn't appear to be any difference, regardless of whcih glyps comes after [qo]. The same is (mostly) true for words beginning [o], where the preferences are mixed, but not hugely dissimilar except for [ot].
('ey.', '.qok') 3.5
('ey.', '.qol') 3.1
('ey.', '.qot') 2.7
('ey.', '.ok') 1.2
('ey.', '.ol') -2.5
('ey.', '.op') -0.8
('ey.', '.or') -1.5
('ey.', '.ot') -2.1
('edy.', '.qok') 3.7
('edy.', '.qol') 2.5
('edy.', '.qot') 3.5
('edy.', '.ok') 0.7
('edy.', '.ol') -2.8
('edy.', '.op') 0.8
('edy.', '.or') -1.8
('edy.', '.ot') 1.4
If [q]--insertion is a thing (and that's a big questions), then it seems to be operating (mostly) the same after both [ey] and [dy]. So the preference for [q] remains due to [y].
Quote:A possible explanation for the end-to-end pattern for [*ey] and [*edy] is that the occurrence of a word final increases the likelihood of it reoccurring as the final of the next word. It seems as also other word finals, like [*d], [*l], [*r], and [*s] have an increased likelihood of reoccurring. It might be interesting to check if beside the tendency to avoid [*n.*n] other exceptions for this pattern exists. Side note: Similar start-to-start patterns might also exist. There are for instances sequences like "ckhy.ckho.ckhy" on You are not allowed to view links. Register or Login to view. or "lkaiin.lkchey.lkain.lror" on f115v.
So these are the numbers I get for End--End repeats:
[d.], [d.]: 0.7 (But this is from
very low number of occurrences.)
[l.], [l.]: -1.7
[r.], [r.]: 1.8
[s.], [s.]: 2.1
[in.], [in.]: -2.1
And these from Start--Start:
[.k], [.k]: -1.5
[.t], [.t]: 1.5 (With a reduced threshold.)
[.d], [.d]: 0.1
[.l], [.l]: 2.2
[.r], [.r]: -1.1 (with a reduced threshold.)
[.ch], [.ch]: -1.6
[.sh], [.sh]: -1.8
The picture is mixed in both cases, and those with a score below +/- 2 might not be showing much unless contrasted with something else. I would be interested to know what others think. They key putting together all the smaller bits of evidence and finding something bigger.
Maybe something like Patrick's transitional probability could be used here: what are the chances which one word ending moves to another, and what is the order which results? If (and very big if) word endings were meaningful in some way, then those kinds of repeating patterns could be revealing.
Quote:Another interesting question for future research might be to check if differences in these patterns exist between Currier A and B, or even between different Currier B sections. One impression I have is that the patterns observable in Currier A often appear more pronounced in Currier B.
That's absolutely something which needs to be done. You are right.
[quote="Torsten" pid='60836' dateline='1723018843']
In my opinion, the statistics demonstrate that [*ey] and [*edy], although related, behave differently.
A possible explanation for the end-to-end pattern for [*ey] and [*edy] is that the occurrence of a word final increases the likelihood of it reoccurring as the final of the next word. It seems as also other word finals, like [*d], [*l], [*r], and [*s] have an increased likelihood of reoccurring.
I know that Slavic/Slovenian theory is ignored by the experts, however this theory best explains the grammar, including the peculiar
ey and
edy suffixes. The Voynich Manuscript was written before the letter
j became in use. The
y was substituted for the semivowels
yat and
yer, which had no equivalent in Latin, therefore they were dropped.
[
attachment=8978]
Because there are no diacritic markers and no separate letters for different sounding vowels and semivowels, the short vowels and semivowels were most of the time not written. The Voynich
dy acquired different meanings, depending which sound it was dropped: dy is used for conjunction 'that, so that', which would be used in conditional clauses. It was pronounced as
di in d
ignit
y. In the 16th century, the dropped semivowels were inserted and
y was replaced with
i or
j. Since
dy as a conjunction changed to
di or
de, and still later to
da.
The verb
dy was changed to
daj or
dej, because it replaced
ai or
ei semivowel. Both forms were used in the Middle Ages.
In the word
chdy (in blue square), the
e is mising after
che, which would be read as
chedy, but shince chedy is already spelling correct, it is more likely that
chdy stands for
che dy (if you give)
.
Dy(daj, dej) is used in Slovenian as imperative of the verb dati (give!). In the old Slovenian, many verbs were formed from the noun by adding the verb
dy, and were later shortened like English take a look - look. Example:
CHUD DY - (marveling give) - shortened to
CHUDY (after double letter
dd was reduced to
d. There are also some words where
d belongs to the root, like chudy (marvel) -where one.
As for the sufix -
ej - this is most common suffix in Slovenian singular imperative, like
rchey (say!), or the words ending, like
key (what).
In the words where
e is long and stressed, it is written as
e, and followed by suffix -
ti, -
dy, it becomes
-edy. Example: TEDY (tedaj - then, that time), otedy (otedi - heal!) chedy - (chedi - clean).
In the VM .
Also, the spelling of
d and
t was inconsistent, so that
d was often used for
t. This brings us to another Slavic/Slovenian suffix -
ty, spelled as -dy. In the VM, the suffix -ti is used for verbs in infinitive form, like dati (to give). If misspelled, it would be written as dady.
I am surprised that nobody noticed the suffix -udy (marked green in my picture)
[
attachment=8979]
Hi Emma,
I have checked some counts for the entire text and obtained a slightly different picture. (I used the Takahashi transcription.)
3600 instances of 'ey.'
3531 instances of 'edy.'
2774 instances of '.qok'
677 instances of '.qot'
244 instances of .'qol'
4701 instances of '.qo'
('ey.', '.qok') 569 instances
('ey.', '.qot') 92 instances
('ey.', '.qol') 61 instances
('edy.', '.qok') 896 instances
('edy.', '.qot') 162 instances
('edy.', '.qol') 86 instances
('ey.', '.qo') 918 instances
('edy.', '.qo') 1376 instances
The counts for the entire text suggest that '.qo' is about 1.5 times more common after 'edy.' than after 'ey.'
The counts for '.ot' for the whole text look very similar:
('ey.', '.ot') 106 instances
('edy.', '.ot') 104 instances
Can you share your raw counts? Which transcription file did you use?
Counts on voynicheses.com
[qo] 5289
[ey] 4025
[edy] 4151
The distribution of [ey] and [edy] is also interesting, [ey] is used in a larger number of folios than [edy]
You are not allowed to view links.
Register or
Login to view.
Some folios have a large number of words containing [qo], [ey] and [edy]
You are not allowed to view links.
Register or
Login to view.
[qo*ey] 711
[qo*edy] 965
There are several chains of words that include the structures [qo*ey] and [qo*edy]
You are not allowed to view links.
Register or
Login to view.
The measures I'm using are z--scores, not token counts. The z--scores show how unusual the occurrence of a feature is in one position of a distribution. If the distribution of a feature is very flat, then a small increase or decrease in occurrence will show up as quite significant. If the distribution is very uneven, then event a relatively large increase or decrease in a position won't be significant. Thus token counts might look the same, but have different meanings.
I think z--scores do have flaws, but:
1) Wordbreak combinations which can be seen with frequency counts also show up well with z--scores.
2) Elements of Latin morphology (as mentioned above) also show up with z--scores.
The benefit is that the z--score is a single measure which---while not perfectly comparable---lets us identify things of interest which we might otherwise miss.
I would like to upload all the data I have (I'm more than willing to share) but the upload limit here is lower than the size of the file. So I'll share a few results to demonstrate the different counts/z--scores. (I also run through an example in my Voynich Day talk, and it is available in the slides.)
For [ey.], [ey.], as an End--End pattern (the distributions are broadly, but not totally, symmetrical):
Counts: ['ey.', 'ey.', [(1936, 289), (1981, 298), (2035, 301), (2086, 314), (2147, 325), (2202, 356), (2259, 385), (2320, 434), (2314, 434), (2266, 385), (2199, 356), (2132, 325), (2067, 314), (2013, 301), (1947, 298), (1889, 289)]]
z--scores: (('ey.', 'ey.'), [-0.8, -0.7, -0.9, -0.7, -0.6, 0.2, 0.9, 2.1, 2.2, 0.8, 0.2, -0.5, -0.6, -0.7, -0.5, -0.5])
The distribution is quite flat, ranging from about 0.15% likelihood to 0.17% in positions +/- 2-8, but then peaking at 0.187% immediately adjacent. That is, the difference in likelihood between adjacent position and the next highest is almost the same size as the total variation in the distribution.
For [ey.], [edy.], the End--End pattern has the following distribution:
Counts: ['ey.', 'edy.', [(1936, 376), (1981, 384), (2035, 376), (2086, 386), (2147, 368), (2202, 348), (2259, 364), (2320, 403), (2314, 342), (2266, 397), (2199, 359), (2132, 365), (2067, 372), (2013, 368), (1947, 362), (1889, 355)]]
z--scores: (('ey.', 'edy.'), [1.4, 1.3, 0.7, 0.7, -0.3, -1.3, -1.1, -0.2, -2.1, -0.1, -1.0, -0.4, 0.3, 0.5, 0.7, 0.9])
We can clearly see the numerical dip immediately following [ey] (the ninth entry), being 50 tokens lower than either side. That's a 1/8th decrease, and the lowest count in the distribution, which gives us a z--score of -2.1.
Neither of these would be very easy to spot otherwise, especially in a mass of data from hundreds of pairs. Moreover, it shows relationships which are not an absolute yes/no, but a nuanced preference/dispreference: words ending [ey] are more likely to be followed by another word ending [ey], and less likely to be followed by one ending [edy]. Individually I don't think these patterns tell us much, but if we can link several together into a bigger theory, then we might gain some insight.
(Koen---or anybody with the authority---can I add a bigger file to this post? About 305 kb, but all text so no risk of viruses.)
I've been intrigued by the same question: word-final -ey vs. -dy (or in fact -edy).
First of all, word-final -y is so frequent that it almost carries no information. I am not aware of any letter in any language that is as superfluous. To a lesser extent this also goes for the extended ending -dy.
However, there are some surprising features. Let's look at words ending -o. These are not frequent, but also not that rare. Here are the most frequent ones (counts from the RF file version 1a).
Code:
120 sho
112 o
74 cheo
73 cho
55 qo
45 sheo
20 qokeeo
19 lo
16 do
15 okeeo
14 okeo
13 oteo
With the exception of 'qo', all words also exist (and are usually frequent) with the 'o' replaced by 'y'.
Many of these words end in -eo (becoming -ey) but there is only one case where the word ends -do and this is the stand-alone word do. This holds for much longer than the 12 most frequent words above.
As usual, I do not know what it tells us, apart from the fact that it is a significant feature.
Whether it means anything or not EY and EDY seem to lean in different directions when the text is broken into sections. Of particular note is the rarity of EDY in the pharmaceutical section.
However EY and EDY do have a similar propensity to end a line, which is relatively low considering how often they end words, especially when compared to just DY or AM.