27-06-2025, 12:18 PM
(27-06-2025, 11:34 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.So that's more like a polyalphabetic cipher with frequent shifting between alphabets. Sounds like a headache!
Indeed. With say 20 characters and 3 encoding possibilities for each one, there are 20^3 = 8000 different substitution alphabets to choose from for every character. And they look enough to me to be able to get many different 'succesful decodings', in many different languages, from any given snippet of text.
But I agree encoding this way would add randomness to the text (*), so both n-grams entropies and word entropies will decrease with respect to a natural language. But I also agree with @Jorge_Stolfi here: n-grams entropies do not necessarily mean much because they are heavily dependent on the transcription. Word entropies are more reliable (they only depend on 'space' being actually a 'space', and on differently written words being actually different words).
(*) provided the choice among the alphabets is random. But if you need to add a deterministic choosing rule to enable the decoding of the text (which looks indispensable), then the effect on entropy becomes much more difficult to determine. I suspect (but cannot prove) it might well be zero.