quimqu > Yesterday, 08:18 AM
(Yesterday, 12:24 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But please, please, please people: the character entropy is a property of the encoding. Not of the language, not of the text. IIff yyoouu jjuusstt wwrriittee eevveerryy lleetttteerr ttwwiiccee, the average per-character entropy will be halved. iF yOU raNdOmLY cAPitAliZe eACh lEtTer, you will add 1 to the average per-character entropy. If you insert two random capital letters after each letter, lMTiGUkSHeWP tNNhLOiZQsDB, you will add at leasr 3 to the per-character entropy.
quimqu > Yesterday, 08:34 AM
(Yesterday, 02:37 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(24-06-2025, 09:57 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The Voynich Manuscript, transcribed using the EVA alphabet
Others have already pointed this out, but it feels worth repeating: this explains a significant part of the behaviour of the graph.
Use modified transliteration alphabets and see what happens.
At least convert Ch Sh in and iin to single characters.
Also a question: are space characters treated just like normal text characters?
Your result is quite different from my results here: You are not allowed to view links. Register or Login to view.
where Voynichese becomes less predictable after 3 characters, but that is probably due to the way in which I treated space characters, namely as separators.
quimqu > Yesterday, 08:43 AM
(24-06-2025, 10:17 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If I'm reading this graph correctly, then making a simple change in the transcription by treating ch and Sh as two distinct whole characters and merging a few more sequences into separate single glyphs, like qo, aiin, ain, would shrink the Voynich graph horizontally and bring it well within the normal range, closer to Romeo and Juliet. Is this so?
Koen G > Yesterday, 09:02 AM
oshfdk > Yesterday, 09:04 AM
(Yesterday, 09:02 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I (as someone who doesn't speak statistese) am confused. If perplexity is basically another way to measure entropy, then why is CUVA higher than Romeo and Juliet? Even with EVA's entropy-reducing properties minimalized, we should still end up well below English texts. Or French. Or the vast majority of Latin texts...
ReneZ > Yesterday, 09:07 AM
(Yesterday, 08:34 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Interestingly, CUVA produces higher perplexity beyond n=3, which might seem counterintuitive.
Bernd > Yesterday, 10:00 AM
(Yesterday, 09:07 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This is something that one might expect more with numbers than with textThis is what has always baffled me. On one hand, the structure of 'Voynichese' appears to be more consistent with numbers, Roman numerals or code-like mathematical operations (bookkeeping?), but it can be expressed in the form of a language-like seemingly readable and pronounceable text. How can such a transformation be accomplished?
ReneZ > Yesterday, 10:17 AM
Koen G > Yesterday, 10:18 AM
(Yesterday, 09:04 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Why so? As far I as know, Voynichese entropy is relatively low only for short ngram values, like bigrams and trigrams and gets much closer to normal with the increasing length. This is exactly what happens in the original graph at the top of this thread.
Mauro > Yesterday, 10:35 AM
(Yesterday, 10:17 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Even if we don't strictly consider numbers, but rather some more generic 'enumeration' system, we will run into a problem that I see clearly in my mind, but may not be able to explain clearly.
The words okeey and qokeey can appear near each other, but differ only on the extreme left side of the word.
The words qokal and qokar are also similar and differ only on the extreme right.
Using computer terminology, if this were an enumeration system, it would appear to be neither high-endian nor low-endian, but rather both-endian.