16-07-2020, 04:55 AM
16-07-2020, 05:38 AM
Stephen Carlson Wrote:How do we know if a vord is broken across lines?
We don't even know if Voynichese "words" are broken across spaces.

16-07-2020, 01:54 PM
Of course one have to assume that a word is a word (as it appears to be). Otherwise investigations on word level are not possible. With a few exceptions, the word separations are clearly recognizable by spaces. One can work with this
You could merge the short words at the end of the line with the first word of the following line, and then check against the VMS dictionary to see if this word already exists, or if a new one was created. If the word should already exist remarkably often, this would speak for a word division at the line break. - At least that would be a start.
(16-07-2020, 04:55 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.How do we know if a vord is broken across lines?
You could merge the short words at the end of the line with the first word of the following line, and then check against the VMS dictionary to see if this word already exists, or if a new one was created. If the word should already exist remarkably often, this would speak for a word division at the line break. - At least that would be a start.
16-07-2020, 07:09 PM
I performed the experiment as described above:
1. find all lines with a short word at the end in the VMS-corpus ( 1 to 2 letters, 537 )
2. Append the found word as prefix to the first word of the following line.
3. search the resulting words in the VMS dictionary and count their occurrences.
![[Image: merged_list.png]](https://wwwhomes.uni-bielefeld.de/mwille2/VMS/merged_list.png)
edit: Since I used again the corpus where all lines with only one word were deleted, the result is only approximate. One could also check the result again with selected folios.
The hits under 3. were also counted if the word occurred within a longer word.
1. find all lines with a short word at the end in the VMS-corpus ( 1 to 2 letters, 537 )
2. Append the found word as prefix to the first word of the following line.
3. search the resulting words in the VMS dictionary and count their occurrences.
![[Image: merged_list.png]](https://wwwhomes.uni-bielefeld.de/mwille2/VMS/merged_list.png)
edit: Since I used again the corpus where all lines with only one word were deleted, the result is only approximate. One could also check the result again with selected folios.
The hits under 3. were also counted if the word occurred within a longer word.
17-07-2020, 10:44 AM
Hi bi3mw,
So out of 537 mergewords , 68 of those form valid words, thats about 12.6%.
Is it possible to count how many times a mergeword is created ?
So out of 537 mergewords , 68 of those form valid words, thats about 12.6%.
Is it possible to count how many times a mergeword is created ?
17-07-2020, 11:29 AM
Hi RobGea,
most of the words were created only once, but there are a few exceptions.
most of the words were created only once, but there are a few exceptions.
17-07-2020, 01:10 PM
Thanks bi3mw.
the Top20 commonest words in the VMS (Adelaide.edu study) are:
daiin ,ol,chedy,aiin,shedy,chol,or,ar,chey,dar,qokeey,qokeedy,shey,qokain,qokedy,dy,qokaiin,al,dal,s.
from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc
would be more likely to exist, simply because they are compounds of the most frequent words.
And indeed oldaiin has the highest count in your list.
I think its impossible to exclude the idea that a 'vord is broken across lines'.
The most frequent 'oldaiin' is what we would expect to find if words never occurred over line-breaks.
Where a word break( if they exist ) occurs within a word would affect these results.
Noteworthy is that ol-chedy does not appear as a mergeword.
the Top20 commonest words in the VMS (Adelaide.edu study) are:
daiin ,ol,chedy,aiin,shedy,chol,or,ar,chey,dar,qokeey,qokeedy,shey,qokain,qokedy,dy,qokaiin,al,dal,s.
from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc
would be more likely to exist, simply because they are compounds of the most frequent words.
And indeed oldaiin has the highest count in your list.
I think its impossible to exclude the idea that a 'vord is broken across lines'.
The most frequent 'oldaiin' is what we would expect to find if words never occurred over line-breaks.
Where a word break( if they exist ) occurs within a word would affect these results.
Noteworthy is that ol-chedy does not appear as a mergeword.
17-07-2020, 01:45 PM
I have created another plot with only exact matches when searching against the dictionary. This time only whole words are considered. The result is much more "clearly arranged"
There is basically only 1 for a hit and 0 for no hit.
![[Image: merged_list_exact.png]](https://wwwhomes.uni-bielefeld.de/mwille2/VMS/merged_list_exact.png)
These are the "real" mergewords found in the dictionary (36):
Conclusion: in the whole VMS there are only 47 hits.

![[Image: merged_list_exact.png]](https://wwwhomes.uni-bielefeld.de/mwille2/VMS/merged_list_exact.png)
These are the "real" mergewords found in the dictionary (36):
Code:
oldaiin
dydain
alchol
ordaiin
ysho
alchor
aldaiin
aldar
amy
aral
arol
dor
dychedy
dydchy
dytchdy
dytchor
llor
ochor
oll
olsaiin
olsain
oly
orcheos
ory
ry
soar
sokeey
sol
syty
vo
yokor
ypchor
ys
yshain
yshey
yshol
Conclusion: in the whole VMS there are only 47 hits.
18-07-2020, 02:42 PM
(17-07-2020, 01:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc
would be more likely to exist, simply because they are compounds of the most frequent words.
And indeed oldaiin has the highest count in your list.
Yes, you would expect a word like "oldaiin" to appear at the top of the list. However, a six-times occurrence as a mergeword in the entire manuscript is not very frequent. From this one could draw conclusions if there are actually word breaks across lines or not.
On the other hand, "oldaiin" occurs only 9 times as a whole word in the manuscript. The composition of ol-daiin within a line is therefore also rather rare.
18-07-2020, 04:06 PM
Thread split as requested 
