I have attached a docx file[
attachment=6422] with the text which has been converted from EVA. In general, I started by translating the Voynich symbols into letters of the alphabet so that the composition was at least 22 letters. Otherwise, it is difficult for me to imagine the usefulness of the text. My goal was to compare the percentage ratio of the number of letters and n-grams in the VMs text with the data on the texts of real languages (Latin, Italian, Greek and Turkish). Of course, I didn't select the texts by length, as I just used the site You are not allowed to view links.
Register or
Login to view., but I think that there will be no cardinal differences, since the Voynich indicators are much higher than ones in the texts in these languages, although if there is more time, maybe I'll do more accurate comparisons.
If we consider the Voynich text includes vowels and consonants either, we can suppose that at least "o", "e" and "y" must be vowels, as well, possibly, "a" and "ch" or "i", as the most frequent letters in such languages as Latin, Italian, Spanish, French, Greek and Turkish are vowels.
The most frequent letter:
Latin "I" takes ~11.5 %, "E" - 11.35 %;
Greek "A" - ~10.75 %;
Italian "E" - ~11.5 %;
Turkish "A" - ~11.55 %.
The most frequent letter in EVA: "o" takes 13.3 %.
A 14281 7,46%
C 13314 6,95%
D 12973 6,77%
E 20070 10,48%
F 505 0,26%
G 96 0,05%
H 17856 9,32%
I 11732 6,12%
K 10934 5,71%
L 10518 5,49%
M 1116 0,58%
N 6141 3,21%
O 25468 13,30%
P 1630 0,85%
Q 5423 2,83%
R 7456 3,89%
S 7387 3,86%
T 6944 3,63%
V 9 0,00%
X 35 0,02%
Y 17655 9,22%
Z 2 0,00%
Sum: 191545
The most frequent Voynich letters with the transliteration of 22 letters where combinations "ch", "sh", "ckh", "cth", "cph", "cfh" stand for separate letters, "ee" stands for a separate letter that is not equal to "e", "q" - null: "o" - 15.76 %, "y" - 10.93 %, "a" - 8.84 %.
Of course, in my "22 letters" version (look the attached document), the percentage of the letter "o" grew because of reduction of the whole quantity of the letters that was influenced by joining of the EVA n-grams. Obviously, "o" amounts too large share in both versions.
Meanwhile, the highest level of a bigram in the text in all the mentioned normal languages reaches ~2.3 %:
Latin "ER" - 2.4 %;
Greek "TO" - 2.32 %;
Italian "TO" - 2.13 %;
Turkish "AR" - 2.19 %.
As for the EVA, the most frequent bigrams: "ch" - 7.33 %, "he" - 5.44%, "dy" - 4.55 %, "ai" - 4.43 %; and about 10 more bigrams that also exceed the level of 2.5 % almost two times as much. "ch" ecxceed this level at least three times as much, so I hardly imagine that it can be a bigram (2 letters).
the "22 letters" version gives a little better situation, but not too much:
1. dy (DI) - 5.69 % (4,41 % without spaces);
2. ai (AY) - 5.54 % (4.27 % without spaces);
3. ok (ON) - 5.05 % (3.97 % without spaces);
4. in (YX) - 3.96 % (3.83% without spaces);
5. che (VE) - 3.54 % (2.73 % without spaces);
6. ol (OL) - 4,71 % (3.74 % wthout spaces)
+ a few more bigrams that also exceed the acceptable level.
Moving of spaces (or changing of token borders) doesn't help too much, the most as the most frequent "o" and "y" are often appears at the borders of tokens forming a new quite frequent bigram "YO" (5,14 %).
Moreover, the Voynich text breaks all records for the frequency of trigrams. Apart from doubtful threegrams "aii" and "iin" which have much higher then normal frequency, the threegram "edy" amounts more then 3,64 % either in EVA and my "22 letters" version. This is absolutely anomal for a normal language. The biggest problem of the Voynich text is that the most frequent symbols (o, y, e, i, d, a) appear next to each other. This is what you many times called a high prediction. And it seems, it doesn't have a solution in usual way.
So how can it be explained? Which cipher, language, dialect can be supposed in such conditions? Whether this issue can be solved with a certain approach or it is impossible?
Earlier we talked about the bigram cipher. Is it possible to detect correct bigrams in some way?
My initial idea was that the most frequent symbols have different values depending their position in a token, but I also not sure whether it can be checked, I don't know how to make a transliteration in this case.