The Voynich Ninja

Full Version: How to recombine glyphs to increase character entropy?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8
(21-04-2022, 10:19 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view....representing the bench glyph as a single character is a obvious choice.
If I understand correctly, have you ever tried to calculate entropy with "ch" as a single letter?
If you did and published it and I don't remember, sorry.
(21-04-2022, 10:54 AM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.
(21-04-2022, 10:19 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view....representing the bench glyph as a single character is a obvious choice.
If I understand correctly, have you ever tried to calculate entropy with "ch" as a single letter?
If you did and published it and I don't remember, sorry.

In the FSG, Currier and v101 alphabets, 'ch' is a single letter, and so are several others.
The bigram entropy using these alphabets is higher than for Eva, but still much lower than for plain texts in most of the interesting languages.
(21-04-2022, 11:53 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The bigram entropy using these alphabets is higher than for Eva, but still much lower than for plain texts in most of the interesting languages.
Thanks, Rene!
In this case another question: were the texts compared the same subjects as those, of course supposed, of Voynich?
A pharmaceutical recipe or a lab protocol would, by definition, have a restricted lexicon, in my opinion.
The entropy we are talking about is character entropy. This means: if I give you a glyph, how easily can you predict the next one? In EVA, if I give you "c", you can very easily predict "h". Of course this is one of the first transformations to perform. As René says, however, tackling the most obvious candidates like "ch" does not bring us anywhere close to any text in any normal language. Even if all cases where EVA potentially splits glyphs are rectified, Voynichese still has a ridiculously low character entropy. So what I tried was to go even further, by also combining glyphs that are clearly separate. And then trying thousands of different combinations. 

The language of the comparison text does not matter and the subject does not matter: Voynichese is in a whole different league. Its characters are extremely predictable, even if we eliminate effects of EVA.
(21-04-2022, 03:19 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.The language of the comparison text does not matter and the subject does not matter
This seems to me to be bad news, that the statistics of thousands of different combinations « and other hocus-pocus » can never help us understand the text?
However, hope never dies
My goal was not to understand the text, it was to see how statistics would behave if I performed certain manipulations.

I am still on vacation, so I'm just typing some quick explanation on my phone (currently on the train from Verona to Venice). 

What I hoped to understand better is: where does the information in a Voynichese word sit? Or rather, where does it not sit? Omit all cases of EVA "n" from the transcription of a page, and your transcription remains essentially the same. EVA-n does not add any information at all, because it is 100% predictable (bar some rare exceptions, as usual).

I never wanted to offer an optimal solution through this exercise. Rather, I wanted to learn more about Voynichese's actual information density, which isn't so bad as that of pure EVA. 

At the level of vocabulary, things are a bit more hopeful, in my opinion. As you read through the text, new vocabulary is introduced at an expected, linguistically plausible rate. Reduplication patterns are an issue, but these might be caused by convergent encoding at character level.
I have attached a docx file[attachment=6422] with the text which has been converted from EVA. In general, I started by translating the Voynich symbols into letters of the alphabet so that the composition was at least 22 letters. Otherwise, it is difficult for me to imagine the usefulness of the text. My goal was to compare the percentage ratio of the number of letters and n-grams in the VMs text with the data on the texts of real languages (Latin, Italian, Greek and Turkish). Of course, I didn't select the texts by length, as I just used the site You are not allowed to view links. Register or Login to view., but I think that there will be no cardinal differences, since the Voynich indicators are much higher than ones in the texts in these languages, although if there is more time, maybe I'll do more accurate comparisons.
If we consider the Voynich text includes vowels and consonants either, we can suppose that at least "o", "e" and "y" must be vowels, as well, possibly, "a" and "ch" or "i", as the most frequent letters in such languages as Latin, Italian, Spanish, French, Greek and Turkish are vowels. 

The most frequent letter:
Latin "I" takes ~11.5 %, "E" - 11.35 %;
Greek "A" - ~10.75 %;
Italian "E" - ~11.5 %;
Turkish "A" - ~11.55 %.

The most frequent letter in EVA: "o" takes 13.3 %.
A       14281   7,46%
C       13314   6,95%
D       12973   6,77%
E       20070  10,48%
F         505   0,26%
G          96   0,05%
H       17856   9,32%
I       11732   6,12%
K       10934   5,71%
L       10518   5,49%
M        1116   0,58%
N        6141   3,21%
O       25468  13,30%
P        1630   0,85%
Q        5423   2,83%
R        7456   3,89%
S        7387   3,86%
T        6944   3,63%
V           9   0,00%
X          35   0,02%
Y       17655   9,22%
Z           2   0,00%
Sum:  191545        

The most frequent Voynich letters with the transliteration of 22 letters where combinations "ch", "sh", "ckh", "cth", "cph", "cfh" stand for separate letters, "ee" stands for a separate letter that is not equal to "e", "q" - null: "o" - 15.76 %, "y" - 10.93 %, "a" - 8.84 %. 

Of course, in my "22 letters" version (look the attached document), the percentage of the letter "o" grew because of reduction of the whole quantity of the letters that was influenced by joining of the EVA n-grams. Obviously, "o" amounts too large share in both versions.
Meanwhile, the highest level of a bigram in the text in all the mentioned normal languages reaches ~2.3 %:
Latin "ER" - 2.4 %;
Greek "TO" - 2.32 %;
Italian "TO" - 2.13 %;
Turkish "AR" - 2.19 %.
As for the EVA, the most frequent bigrams: "ch" - 7.33 %, "he" - 5.44%, "dy" - 4.55 %, "ai" - 4.43 %; and about 10 more bigrams that also exceed the level of 2.5 % almost two times as much. "ch" ecxceed this level at least three times as much, so I hardly imagine that it can be a bigram (2 letters).
the "22 letters" version gives a little better situation, but not too much:
1. dy (DI) - 5.69 % (4,41 % without spaces);
2. ai (AY) - 5.54 % (4.27 % without spaces);
3. ok (ON) - 5.05 % (3.97 % without spaces);
4. in (YX) - 3.96 % (3.83% without spaces);
5. che (VE) - 3.54 % (2.73 % without spaces);
6. ol (OL) - 4,71 % (3.74 % wthout spaces)
+ a few more bigrams that also exceed the acceptable level.

Moving of spaces (or changing of token borders) doesn't help too much, the most as the most frequent "o" and "y" are often appears at the borders of tokens forming a new quite frequent bigram "YO" (5,14 %).
Moreover, the Voynich text breaks all records for the frequency of trigrams. Apart from doubtful threegrams "aii" and "iin" which have much higher then normal frequency, the threegram "edy" amounts more then 3,64 % either  in EVA and my "22 letters" version. This is absolutely anomal for a normal language. The biggest problem of the Voynich text is that the most frequent symbols (o, y, e, i, d, a) appear next to each other. This is what you many times called a high prediction. And it seems, it doesn't have a solution in usual way.
So how can it be explained? Which cipher, language, dialect can be supposed in such conditions? Whether this issue can be solved with a certain approach or it is impossible?
Earlier we talked about the bigram cipher. Is it possible to detect correct bigrams in some way? 
My initial idea was that the most frequent symbols have different values depending their position in a token, but I also not sure whether it can be checked, I don't know how to make a transliteration in this case.
(21-04-2022, 05:54 PM)Searcher Wrote: You are not allowed to view links. Register or Login to view.I started by translating the Voynich symbols into letters of the alphabet so that the composition was at least 22 letters
I don't understand why you converted the text and in what way, I didn't recognise anything. Did you separate the A and B languages?
(21-04-2022, 06:13 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.
(21-04-2022, 05:54 PM)Searcher Wrote: You are not allowed to view links. Register or Login to view.I started by translating the Voynich symbols into letters of the alphabet so that the composition was at least 22 letters
I don't understand why you converted the text and in what way, I didn't recognise anything. Did you separate the A and B languages?
a  = A      14281  8,84%
m  = B        1116  0,69%
sh  = C        4666  2,89%
d  = D      12973  8,03%
e  = E      10580  6,55%
cph = F        216  0,13%
cfh = G        170  0,11%
f  = H        637  0,39%
y  = I      17655  10,93%
l  = L      10518  6,51%
t  = M        5994  3,71%
k  = N      10028  6,21%
o  = O      25468  15,76%
ckh = P        906  0,56%
p  = Q        1414  0,88%
r  = R        7456  4,61%
s  = S        2886  1,79%
cth = T        950  0,59%
ee  = U        4745  2,94%
ch  = V      11012  6,82%
n  = X        6176  3,82%
i  = Y      11732  7,26%
z  = z          2  0,00%
SUM:161581
I transliterated EVA to make almost full alphabet text. I think, in general we can reach the alphabet of 24 letters. On my view, the text written mostly in 18 letters can't be of full value. 
This time, I tested the whole text of the VMs.
(21-04-2022, 06:54 PM)Searcher Wrote: You are not allowed to view links. Register or Login to view.I transliterated EVA to make almost full alphabet text.
All the calculations of letter frequencies have already been done 20 or 30 years ago, I don't remember if the difference was made between A and B languages. Nick has spoken several times on his site about the need to treat them separately.

And I still don't understand why at this stage replace the EVA letters, since you don't know their true value anyway. Moreover, I didn't see q EVA in your alphabet
Pages: 1 2 3 4 5 6 7 8