15-02-2023, 10:49 AM
Hi Yulia,
it's great to read from you!
The hypothesis you discuss addresses one of the major problems of Voynichese and it has been put forward in the past (e.g. by You are not allowed to view links. Register or Login to view., but I am sure by others as well).
A first problem I see is that repetition is quite pervasive in the VMS. Where do we stop throwing stuff away? Your example includes a repetition with edit:2 distance (from9 9from). Is that the maximum edit distance for irrelevant words? Edit distance (which I am using here to keep things simple) is a totally anachronistic way to decide what to keep and what to dump, what should be used instead? In a sequence of similar words, do we keep the first one, the last one or there is some different criterium?
Example:
<f82v.8,+P0> qokain.sheol.qoteedy.chedy.qokey.qokedy.qokol.chedy.chedy.lchy
EVA edit distances between consecutive words:
qokain sheol 6
sheol qoteedy 6
qoteedy chedy 4
chedy qokey 4
qokey qokedy 1
qokedy qokol 3
qokol chedy 5
chedy chedy 0
chedy lchy 3
Let's say we remove everything with an edit-distance of 3 or less. I am keeping the first word of each sequence:
We dump 4/10 of the words and are left with:
qokain sheol qoteedy chedy qokey chedy
These 6 words still seem to show a rigid structure (low character entropy) and repetitions (though they now alternate: 3 qo words + 3 bench-e words, with chedy occurring twice and 4 consecutive -y words).
Also, subtler repetitions like You are not allowed to view links. Register or Login to view. will probably be almost unaffected by dumping consecutive similar words (unless the threshold is so high that you dump most of the manuscript).
To summarize, two of the problems I see are:
it's great to read from you!
The hypothesis you discuss addresses one of the major problems of Voynichese and it has been put forward in the past (e.g. by You are not allowed to view links. Register or Login to view., but I am sure by others as well).
A first problem I see is that repetition is quite pervasive in the VMS. Where do we stop throwing stuff away? Your example includes a repetition with edit:2 distance (from9 9from). Is that the maximum edit distance for irrelevant words? Edit distance (which I am using here to keep things simple) is a totally anachronistic way to decide what to keep and what to dump, what should be used instead? In a sequence of similar words, do we keep the first one, the last one or there is some different criterium?
Example:
<f82v.8,+P0> qokain.sheol.qoteedy.chedy.qokey.qokedy.qokol.chedy.chedy.lchy
EVA edit distances between consecutive words:
qokain sheol 6
sheol qoteedy 6
qoteedy chedy 4
chedy qokey 4
qokey qokedy 1
qokedy qokol 3
qokol chedy 5
chedy chedy 0
chedy lchy 3
Let's say we remove everything with an edit-distance of 3 or less. I am keeping the first word of each sequence:
- qokey for qokey.qokedy.qokol
- chedy for chedy.chedy.lchy
We dump 4/10 of the words and are left with:
qokain sheol qoteedy chedy qokey chedy
These 6 words still seem to show a rigid structure (low character entropy) and repetitions (though they now alternate: 3 qo words + 3 bench-e words, with chedy occurring twice and 4 consecutive -y words).
Also, subtler repetitions like You are not allowed to view links. Register or Login to view. will probably be almost unaffected by dumping consecutive similar words (unless the threshold is so high that you dump most of the manuscript).
To summarize, two of the problems I see are:
- There are numberless ways of deciding what to dump and what to keep of the similar consecutive words (see how different what Pardis does is from the edit-distance method above).
- Whatever we do, I am afraid we throw away much of the text and the result still has all the problems of Voynichese, minus the consecutive repetition of the same word.