I am not completely sure I haven't made substantial mistakes. So take this with a grain of salt. Also, I don't know if this has been discussed before.
These graphs represent the expected (red) and actual (green) counts for consecutive words in which the last letter of the first word is the same as the first letter of the second word. For instance (where '.' represent word boundaries):
bu
t.these
other
s.said
sevent
h.hour
I considered:
a classical Latin text (De Bello Gallico). 51328 words
early modern English (the gospel of John from King James Bible). 19131 words
a medieval Occitan text (You are not allowed to view links.
Register or
Login to view.). 15461 words
medieval Italian (Divine Commedy). 105682 words
VMS - Takahashi's EVA transcription. 37718 words
VMS Currier-D'Imperio transcription, extracted via ivtt. 16120 words
The expected occurrences for each letter occurring as -X.X- is computed in this way:
T is the total number of consecutive word pairs
E is the number of these pairs in which the first word ends with -X
S is the number of these pairs in which the second word starts with X-
The expected number of X.X is T*(E/T)*(S/T)=E*S/T
For instance, in the VMS Takahashi transcription there are 32053 word couples.
In 1082 of these, the first word ends with EVA: -o ( 1082/ 32053 = 3.4%)
In 7315 couples, the second word starts with o- ( 7315/ 32053 = 22.8%)
We expect 22.8% * 2.4% * 32053=247 occurrences of o.o (plotted in red), but we only find 114 actual occurrences (plotted in green).
While all the texts show some degree of difference between expected and actual, I think it is clear that for Latin and English the differences are small and in both directions, while they are much larger for Occitan and Italian and all due to smaller actual counts than expected. This is due to the fact that Occitan and Italian spelling is subject to euphonic transformations that avoid the repetition of the same vowel across word boundaries.
For instance, in Occitan, the final -e of the preposition 'de' is dropped before 'e-'
juntturas.de.las.cambas
fuelhas.d.erba
The same things happens for the determinative articles:
lo.matin
l.omme
One can also note that the most apparent shift from expected values in Latin corresponds to a.a. In my opinion, this is due to one of the very few euphonic variants of Latin: the two versions of the preposition a/ab, the second one being used before words starting with a vowel.
a milibus
a germanis
ab inimicis
ab aliis
ab armis
In my opinion, this evidence (if confirmed) could support the word-boundaries transformations discussed by You are not allowed to view links.
Register or
Login to view..
It is clear that, if what we observe in the VMS is due to euphonic transformations, these are quite different from those that happen in Latin languages. In these languages, transformations are limited to short and frequent words (mostly prepositions and articles) and typically affect the end of the words. The phenomena discussed by Emma affect longer words and mostly seem to happen at the beginning of words (I am thinking in particular of the You are not allowed to view links.
Register or
Login to view. on the basis of the ending of the preceding word). [/i]