I forgot to mention:
The shuffling is performed by Fisher-Yates's algorithm, also known as Knuth's shuffle algorithm. So here the words are very "randomly" re-sorted ( across line boundaries ).
When checking several folios, it is noticeable that very short words are often written at the end of the line which look like "appended". Of course this has an influence on the results.
(14-07-2020, 11:47 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.When checking several folios, it is noticeable that very short words are often written at the end of the line which look like "appended". Of course this has an influence on the results.
Currier noted a similar thing :
"The ends of the lines contain what seems to be, in many cases, meaningless symbols: little groups of letters which don't occur anywhere else,
and just look as if they were added to fill out the line to the margin. "
Though there seems to be some debate as to exactly what Currier meant.
I wouldn't be surprised if Currier was referring to the phenomenon I described. One could almost think that there is some kind of obfuscation ( for whatever ).
What I can say after my tests is that very short words probably are not that often, not even slightly modified, at the beginning of a line. This would have been noticeable in the results, at least in the folios I tested. For example, in folio 115v, words of length one to three letters occur zero times at the beginning of the line but nine times at the end of the line.
As said, a value below 3 is rare and with similarly short word pairs there are only a few possibilities of substitution.
edit: in the entire VMS, there are much more often short words at the end of a line as at the beginning ( Word length one to two letters, result 153 / 537 ).
Here are the word lists (one to two letters). I have removed all lines in the VMS corpus with only one word (for Levenshtein test). So the result is only approximate.
Hi Matthias,
this is an interesting subject that definitely deserves more investigation!
Something partially related was discussed You are not allowed to view links.
Register or
Login to view.. See also the previous posts in that thread and Elmar Vogt's paper about You are not allowed to view links.
Register or
Login to view..
It seems that words at the end of lines tend to be shorter also in normal printed texts. I doubt that this in an effect of hyphenation: short words are more likely to be squeezed at the end of a line, where there would not be room for a longer word, but I don't think I ever checked if this assumption is correct.
(15-07-2020, 02:42 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.It seems that words at the end of lines tend to be shorter also in normal printed texts.
I'm not sure about that, but in the VMS there are noticeably often short words at the end of the line. It often seems as if they were appended to create a homogeneous line spacing to the right margin. So they are possibly filler words. This possibility would, as said, have a strong influence on the result and is discussed further You are not allowed to view links.
Register or
Login to view. .