![]() |
[split] Merged words across lines - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: [split] Merged words across lines (/thread-3291.html) |
[split] Merged words across lines - Stephen Carlson - 16-07-2020 How do we know if a vord is broken across lines? RE: Levenshtein distance line by line - -JKP- - 16-07-2020 Stephen Carlson Wrote:How do we know if a vord is broken across lines? We don't even know if Voynichese "words" are broken across spaces. ![]() RE: Levenshtein distance line by line - bi3mw - 16-07-2020 Of course one have to assume that a word is a word (as it appears to be). Otherwise investigations on word level are not possible. With a few exceptions, the word separations are clearly recognizable by spaces. One can work with this (16-07-2020, 04:55 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.How do we know if a vord is broken across lines? You could merge the short words at the end of the line with the first word of the following line, and then check against the VMS dictionary to see if this word already exists, or if a new one was created. If the word should already exist remarkably often, this would speak for a word division at the line break. - At least that would be a start. RE: Levenshtein distance line by line - bi3mw - 16-07-2020 I performed the experiment as described above: 1. find all lines with a short word at the end in the VMS-corpus ( 1 to 2 letters, 537 ) 2. Append the found word as prefix to the first word of the following line. 3. search the resulting words in the VMS dictionary and count their occurrences. ![]() edit: Since I used again the corpus where all lines with only one word were deleted, the result is only approximate. One could also check the result again with selected folios. The hits under 3. were also counted if the word occurred within a longer word. RE: Levenshtein distance line by line - RobGea - 17-07-2020 Hi bi3mw, So out of 537 mergewords , 68 of those form valid words, thats about 12.6%. Is it possible to count how many times a mergeword is created ? RE: Levenshtein distance line by line - bi3mw - 17-07-2020 Hi RobGea, most of the words were created only once, but there are a few exceptions. RE: Levenshtein distance line by line - RobGea - 17-07-2020 Thanks bi3mw. the Top20 commonest words in the VMS (Adelaide.edu study) are: daiin ,ol,chedy,aiin,shedy,chol,or,ar,chey,dar,qokeey,qokeedy,shey,qokain,qokedy,dy,qokaiin,al,dal,s. from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc would be more likely to exist, simply because they are compounds of the most frequent words. And indeed oldaiin has the highest count in your list. I think its impossible to exclude the idea that a 'vord is broken across lines'. The most frequent 'oldaiin' is what we would expect to find if words never occurred over line-breaks. Where a word break( if they exist ) occurs within a word would affect these results. Noteworthy is that ol-chedy does not appear as a mergeword. RE: Levenshtein distance line by line - bi3mw - 17-07-2020 I have created another plot with only exact matches when searching against the dictionary. This time only whole words are considered. The result is much more "clearly arranged" ![]() ![]() These are the "real" mergewords found in the dictionary (36): Code: oldaiin Conclusion: in the whole VMS there are only 47 hits. RE: Levenshtein distance line by line - bi3mw - 18-07-2020 (17-07-2020, 01:10 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.from that, we would expect to find that the mergewords ol-daiin ,ol-chedy, or-daiin,ol-daiin,etc Yes, you would expect a word like "oldaiin" to appear at the top of the list. However, a six-times occurrence as a mergeword in the entire manuscript is not very frequent. From this one could draw conclusions if there are actually word breaks across lines or not. On the other hand, "oldaiin" occurs only 9 times as a whole word in the manuscript. The composition of ol-daiin within a line is therefore also rather rare. RE: [split] Merged words across lines - Koen G - 18-07-2020 Thread split as requested ![]() |