(19-08-2015, 11:15 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.I counted 114 unique words that can be repeated. It is only 1,7% of the total of 6818 unique words. Is that too many as compared with natural languages?
Is not the repetition issue too exaggerated?
I go back to the first page of the thread and Anton's original comments. I don't think the repetition issue is exaggerated. On the contrary, I think it is an important feature of Voynichese. It might not be a unique feature, but still it very likely is a meaningful feature.
I have checked Virgil's Aeneid, La Divina Commedia and Matthioli.
- I have considered an excerpt of the Aeneid just slightly longer than the VMS (37K words). It contains 7 exact repetitions (6 unique repeating words). 12236 different words in the text. 0.05% repeat
- La Divina Commedia has almost twice the words of the VMS. It contains 33 exact repetitions (31 unique repeating words). 9871 different words in the text. 0.3% repeat.
- The excerpt from Matthioli I used is more than 100K words long. It contains 9 exact repetitions (8 unique repeating words). 21910 different words in the text. 0.04% repeat
I think this kind of phenomenon largely depends on the individual style of writing (so it could not strictly depend on the "language" and could vary greatly in different texts written in the same language). It's interesting to see that Dante used repetitions to stress specific meanings. The most frequent usage seems to me to intensify an imperative.
Volsersi verso me le buone scorte;
e Virgilio mi disse: «Figliuol mio,
qui può esser tormento, ma non morte.
Ricorditi, ricorditi! E se io
sovresso Gerion ti guidai salvo,
che farò ora presso più a Dio?» (Purg. 27.19-24)
My gentle escorts turned to me,
and Virgil said: “My son, though there may be
suffering here, there is no death.
Remember
remember! If I guided you to safety
even upon the back of Geryon,
then now, closer to God, what shall I do?”
I think the high number of repetitions in Dante doesn't depend on the Italian language, but more on the poet's style. But there could be languages in which some similar pattern is common and not "poetic".
The number of unique repeating words in the VMS is more than 5 times greater than the higher observed figure in these comparisons. But the difference in actual (non-unique) repetitions is much greater: close to 10 times greater than the number of repetitions in longer texts.
I also want to share some data about non-exact repetitions. This is an histogram of 'quasi-repetitions' of the following types:
pW.W
W.pW
Ws.W
W.Ws
Where
W is a generic word at least 4 EVA characters long
p is a 1 or 2 characters prefixed
s is a 1 or 2 characters suffix
. is a space separating two words
Some examples.
q- pre:
<f116r.17,+P0>dain.chey.
qokeey.okeey.lain.okeey.qol.chedy<$>
q- post:
<f33r.2,+P0>ytchedy.qokar.cheky.
okaldy.qokaldy.otor.oldar.qotar.otar.otardam
o- pre:
<f84v.39,+P0>lshedy.qol.aiin.okey.
olchey.lchey.olshedy.shckhy.soly
o- post:
<f86v6.36,+P0>dairal.daiin.qokar.choltal.cthdy.qokeey.
lkaiin.olkaiin.araiin
[
attachment=1642]
You are not allowed to view links.
Register or
Login to view., I mentioned that qo- words are those that account for most exact repetitions (of the W.W pattern).
This analysis of prefix and suffix quasi-repetitions (of course there are quasi-repetitions that don't match these patterns) confirms the correlation with q-.
Other notes:
- q- and o- contribute to quasi repetitions both individually and in the qo- combination.
- suffixes seem to play a minor role in the phenomenon: the top six variants are all prefixes.
- q- has no preference between the pre- and post- variants (qW.W and W.qW have very similar frequencies)
These are some triple quasi-repetitions that exemplify the mobility of the q- prefix.
<f31r.10,+P0>tolshso.
okedy.okedy.qokedy.qokeedy.dar.shedshey-
<f84v.14,+P0>dshey.olkeey.dol.ol.otedy.
okedy.okedy.qokedy.dal.dar.ol.chedy.sain-
<f75r.45,+P0>sshedy.shckhy.qokey.okedy.sarol.oty.
otedy.qotedy.otedy.okaiin-
<f99v.38,+P0>ol.cheey.
qokeol.okeol.okeol.shokol.ykey-
<f108v.8,+P0>ysheedy.okeedy.oteedy.
qokeedy.okeedy.okeedy.chedal.okar.qoteedar.oty-
I find all these repetitions and quasi-repetitions extremely fascinating. Both the correlation with q- and the high frequency with which these repetitions occur make clear that they are not random or coincidental.
One should also consider that there are 27 words that occur less than 30 times and still present exact consecutive occurrences.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
The probability of each single one of these cases to happen by coincidence is less than 1 in a thousand. Given the number of distinct words in the ms, one could expect a few such cases, not 27.
Cases like qopchedy (having a total of 32 occurrences and 2 exact repetitions) are also noteworthy.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.