The Voynich Ninja

Pages: 1 2 3

How do we know if a vord is broken across lines?

Stephen Carlson Wrote:How do we know if a vord is broken across lines?

We don't even know if Voynichese "words" are broken across spaces. Smile

Of course one have to assume that a word is a word (as it appears to be). Otherwise investigations on word level are not possible. With a few exceptions, the word separations are clearly recognizable by spaces. One can work with this

(16-07-2020, 04:55 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.How do we know if a vord is broken across lines?

You could merge the short words at the end of the line with the first word of the following line, and then check against the VMS dictionary to see if this word already exists, or if a new one was created. If the word should already exist remarkably often, this would speak for a word division at the line break. - At least that would be a start.

I performed the experiment as described above:

1. find all lines with a short word at the end in the VMS-corpus ( 1 to 2 letters, 537 )
2. Append the found word as prefix to the first word of the following line.
3. search the resulting words in the VMS dictionary and count their occurrences.

[Image: merged_list.png]

edit: Since I used again the corpus where all lines with only one word were deleted, the result is only approximate. One could also check the result again with selected folios.

The hits under 3. were also counted if the word occurred within a longer word.

Hi bi3mw,
So out of 537 mergewords , 68 of those form valid words, thats about 12.6%.
Is it possible to count how many times a mergeword is created ?

Hi RobGea,

most of the words were created only once, but there are a few exceptions.

Thanks bi3mw.

the Top20 commonest words in the VMS (Adelaide.edu study) are:
daiin ,ol,chedy,aiin,shedy,chol,or,ar,chey,dar,qokeey,qokeedy,shey,qokain,qokedy,dy,qokaiin,al,dal,s.

from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc
would be more likely to exist, simply because they are compounds of the most frequent words.

And indeed oldaiin has the highest count in your list.

I think its impossible to exclude the idea that a 'vord is broken across lines'.
The most frequent 'oldaiin' is what we would expect to find if words never occurred over line-breaks.
Where a word break( if they exist ) occurs within a word would affect these results.
Noteworthy is that ol-chedy does not appear as a mergeword.

I have created another plot with only exact matches when searching against the dictionary. This time only whole words are considered. The result is much more "clearly arranged" Wink

There is basically only 1 for a hit and 0 for no hit.

[Image: merged_list_exact.png]

These are the "real" mergewords found in the dictionary (36):

Code:
oldaiin

dydain

alchol

ordaiin

ysho

alchor

aldaiin

aldar

amy

aral

arol

dor

dychedy

dydchy

dytchdy

dytchor

llor

ochor

oll

olsaiin

olsain

oly

orcheos

ory

ry

soar

sokeey

sol

syty

vo

yokor

ypchor

ys

yshain

yshey

yshol

Conclusion: in the whole VMS there are only 47 hits.

(17-07-2020, 01:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc
would be more likely to exist, simply because they are compounds of the most frequent words.

And indeed oldaiin has the highest count in your list.

Yes, you would expect a word like "oldaiin" to appear at the top of the list. However, a six-times occurrence as a mergeword in the entire manuscript is not very frequent. From this one could draw conclusions if there are actually word breaks across lines or not.

On the other hand, "oldaiin" occurs only 9 times as a whole word in the manuscript. The composition of ol-daiin within a line is therefore also rather rare.

Thread split as requested Smile

Pages: 1 2 3

Stephen Carlson

-JKP-

bi3mw

bi3mw

RobGea

bi3mw

RobGea

bi3mw

bi3mw

Koen G