I have generated reduplication/quasi-reduplication plots including all the language samples in the text corpora by Koen (621 texts) and You are not allowed to view links.
Register or
Login to view. (54 samples).
Thanks to Koen and Brian who shared their corpora, and to Jonas Alin who pointed out Cham's corpus You are not allowed to view links.
Register or
Login to view..
This is the overall plot:
[
attachment=4259]
The plot is made unreadable by the very high reduplication rate of a single outlier N-PML (all samples labelled with the N- prefix are from Cham's corpus). The text is described as "Sabir, a.k.a. Mediterranean Lingua Franca - Extracts from The Bourgeois Gentleman by Molière (1670) 240 words". It apparently is an attempt to reproduce spoken language.
In this text, almost 12% of all consecutive words are identical. On the other hand, quasi-reduplication never occurs.
A fragment from this text (reduplication highlighted):
Se ti sabir,
Ti respondir;
Se non sabir,
Tazir, tazir.
Mi star Mufti:
Ti qui star ti?
Non intendir:
Tazir, tazir.
Mahametta per Giordina
Mi pregar sera é mattina:
Voler far un Paladina
Dé Giourdina, Dé Giourdina.
Dar turbanta, é dar scarcina,
Con galera é brigantina,
Per deffender Palestina.
Mahametta, etc.
Star bon Turca Giourdina?
Hi valla.
Hu la ba ba la chou ba la ba ba la da.
Ti non star furba?
No, no, no.
Non star furfanta?
No, no, no.
If one removes N-PML from the data-set, the plot looks like this:
[
attachment=4258]
Blue samples correspond to the VMS
The orange square is the text generated by Timm and Schinner's software.
Purple samples are from Cham's corpus, yellow samples from Koen's.
The large green circles are samples I have chosen for further discussion.
The red line is X=Y: it makes it easy to see that typically reduplication is more frequent than quasi-reduplication. Though nothing comes close to the VMS, these are a couple of texts that exhibit both phenomena, with more quasi-reduplication than reduplication:
N-EMY is a Mayan (Ch’olti’) dictionary: Bocabulario Grande by Fray Francisco Morán (1695).
From the text:
agua menuda puz puz ; palpal ha
azul - color yax yax
apuntalar tontei ; nostenahib, el tribo
a donde, por donde - tuba
alisar yulyul, yuhlin
agua clara, berde, azul - yaxha
ageno, agena yantal
abergonsar tzubalez. tzublez
It seems that most occurrences of reduplication are Mayan expressions. For instance, in Ch’olti’ "blue" is "yax yax". Quasi reduplication occurs both in Ch’olti’ and Spanish, where variants for the same words are given (for instance, I guess that the Spanish "ageno agena" are the masculine and feminine for "stranger").
Slav_.m is a text collected by Koen, originally named "Slav_NovumTestamentum.txt". According to google translate, the text is Bulgarian.
Perfect reduplication only occurs twice:
да.да (yes yes)
нет.нет (no no)
All occurrences of quasi reduplication are concentrated in the first few lines, that describe the genealogy of Christ:
фарес родил
есрома есром родил
арама арам родил
аминадава аминадав родил наассона
fares rodil esroma esrom rodil arama aram rodil aminadava aminadav rodil naassona
Perez the father of Hezron, Hezron the father of Ram, Ram the father of Amminadab, Amminadab the father of Nahshon
These are consecutive occurrences of the names of people in two different cases: in all couples, the two words only differ for the final character. Since the file is rather short (less than 3000 words), these occurrences of quasi reduplication have a considerable weight.
I have also investigated sequences of three words in one of these two forms: X X X' (perfect reduplication, followed by quasi-reduplication) and X' X X (quasi-reduplication followed by perfect reduplication). I have compared the rate of the occurrences in the original file with the rate in a random scrambling of the file (averaging on 20 different scrambles for each file).
Here the plot immediately makes clear that the whole VMS is quite different from anything else. Again, the red line is X=Y: the position of the VMS samples means that the two triple patterns are much more frequent in the actual manuscript than in the scrambled versions: this was somehow expected, since You are not allowed to view links.
Register or
Login to view. that both reduplication and quasi-reduplication appear to be more frequent than in scrambled data.
[
attachment=4257]
Anyway, these triple patterns are not easily produced by random order: their extensive presence in the VMS might confirm that reduplication and quasi-reduplication are related and tend to appear consecutively.
The vast majority of the other data fall at 0,0: i.e. triple patterns never appear in the original file and its scrambled versions.
In order to have a look at the detail of the other files, I plotted the data in logarithmic scale. To move samples away from the origin, I simply added a very small quantity (0.0001) to both measures (I understand that this is a clumsy solution). Please remember that logarithmic scale reduces the apparent distance between far away samples: the VMS is not as close to the rest as it appears to be here.
[
attachment=4256]
Erasm.h might be the file that comes closer to the VMS (though with one order of magnitude lower frequency). The complete file name in Koen's corpus is ErasmusProverbsLatinEnglish.txt. The samples contains 13417 words, about 1/3 than the VMS, but it includes a single "triple pattern" (X' X X), while 37 appear in the whole VMS.
nothyng stycketh more fastly than that that is receyued and taken of pure youth not yet infected wyth peruerse and croked maners or opinions
This single case looks totally casual. Also, it occurs in a text that does not have a high frequency of reduplication and quasi-reduplication (see the other plots above).
Only two files, both in Cham's corpus, contain 2 occurrences of triple patterns:
N-HAW - A 1861 Hawain text.
aia la ke kau nei hoi
ka hoku pakipika
iluna o ke aouli
ka lani kiekie
lehulehu
lehulehu
lelulehu no lakou
o keia pepa aole ia na kekahi haole aole aole hoi na ka mea hookahi aka na na kanaka
I have not tried to understand what these patterns mean. From the graph above, one can see that reduplication is quite frequent in Hawaian. I guess it may be a feature of the language (as for the Mayan language discussed above).
N-FIN - the Finnish Kalevala
This is the collection of poetry that Jonas Alin pointed out in the thread I linked above. The poems (called "Runes") were transcribed by Elias Lönnrot in the first half of the XIX Century; he travelled through the country in search of the last people to be familiar with an oral tradition that likely dated many centuries back. The content of the poems is mostly pagan, with only limited references to Christianity.
The relatively high number of reduplication and quasi-reduplication is one of the consequences of the strongly alliterative style of the poems.
I had a few exchanges on You are not allowed to view links.
Register or
Login to view. with a native speaker with a degree in Finnish language (u/Vilmiira) who helped me understand something of the two occurrences of triple patterns in Kalevala.
(Rune 11, 180)
Enkä huoli huitukoille, huitukoille, haitukoille;
This means something like "I have no interest in carefree people, carefree people, carefree people"
Vilmiira Wrote:Huitukoille = huitukka + (o)i + lle, where the (o)i indicates plural form, and lle is the ending meaning to or for someone.
huitukka has a connotatiin of a young, carefree girl [actually a person] who runs around and could also indicate somewhat loose morals, but it's not a really bad word. Haitukka is a version of this, it does not have a separate meaning but rather a poetic version, that doesn't really exist otherwise.
(Rune 42, 219)
Itse seppo Ilmarinen, toinen lieto Lemminkäinen, nepä tuossa soutelevat, soutelevat, joutelevat selviä selän vesiä, lake'ita lainehia.
The blacksmith, Ilmarinen, with the flighty Lemminkainen, they are
rowing, rowing, gliding over the clear waters of the sea, over the waste of waves.
Here I render joutelevat as "gliding" (on the basis of the English translation by Crawford)
Vilmiira Wrote:Soutelevat = soutaa + ele + vat, where soutaa is the root verb "to row", the ele-suffix creates a new frequentative version of the verb (to row continuously or in more relaxed manner), and vat-suffix makes it plural.
...
Here again, the word joutelevat cannot really be translated, at least with my knowledge. It seems to be a similar thing to the previous example, where the word is purposefully taken to look almost the same.
Form what Vilmiira wrote, I understand that the Finnish language had and still has the possibility of slightly altering a word for expressive reasons. This alteration may introduce a slightly different meaning, but sometimes the meaning appears to be unaltered.
In the case of the Kalevala, there is little doubt that reduplication and quasi-reduplication are due to a single cause: alliteration. Another alliterating feature in this text that could make it comparable with Voynichese is a preference to have consecutive words with the same initial sounds. This is an example from the Academia paper You are not allowed to view links.
Register or
Login to view. - Frog and Eila Stepanova.
[
attachment=4255]
But coming back to reduplication, quasi-reduplication and the triple patterns, one should note that the Kalevala is not anyway near to frequencies in the VMS. In Voynichese, reduplication and quasi-reduplication appear in an amazing 3% of consecutive word couples. In the Kalevala, they are one order of magnitude rarer (0.3% total).
Similarly, the two triple patterns occur once every 1000 Voynich words (37occurrences in 36293 words), while in the Kalevala they are 34 times rarer (2 occurrences in 68156 words).
If the VMS is meaningful and if similar Voynichese words correspond to similar words in some language, the text appears to make a uniquely intensive use of alliteration. I am sure that alliterative compositions were traditional in many other languages in addition to Finnish and related languages, but I don't expect that finding actual examples will be easy.