(11-09-2025, 08:16 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.So, can it actually encode 10000 symbols of Latin keeping the entropies low? Is there some source code to reproduce this?
Hello oshfdk, I made my Notebook available here:
You are not allowed to view links.
Register or
Login to view.
Note that for my last tests, Culpepper's text was the one that had a better JSD with Voynich (EVA) at 0.21 (with all gliphs) and 0.0998 (only with latin characters):
![[Image: WIsUhRa.png]](https://i.imgur.com/WIsUhRa.png)
(17-09-2025, 01:12 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Note that for my last tests, Culpepper's text was the one that had a better JSD with Voynich (EVA) at 0.21 (with all gliphs) and 0.0998 (only with latin characters)
Thanks!
Is it possible to compute the entropy with the residuals? Just to have a better understanding.
Also, what is the length of the ciphertext for 10000 Latin symbols?
(17-09-2025, 01:39 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Is it possible to compute the entropy with the residuals? Just to have a better understanding.
Also, what is the length of the ciphertext for 10000 Latin symbols?
The cipher is letter-by-letter at fixed positions, so word lengths and total length are preserved. If you feed it 10000 Latin letters, you get 10000 cipher glyphs (same number of words, same per-word length).
Computing entropy with residuals doesn’t really apply (under my opinion). Residuals aren’t part of the ciphertext; they’re the little side notes that tell you which original token a shared glyph stood for.
Think of it like this: the scribe (or his "master") writes the cipher text on one sheet, and on a separate slip he jots the residual numbers (only needed to reverse collisions). What we have in the Voynich is the sheet with glyphs, not the slip. Measuring entropy for "glyphs + residuals" would be measuring a different, two-stream encoding that isn’t observed in the manuscript.
(17-09-2025, 02:32 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The cipher is letter-by-letter at fixed positions, so word lengths and total length are preserved. If you feed it 10000 Latin letters, you get 10000 cipher glyphs (same number of words, same per-word length).
In your first post "ber" was encoded as "qoak" and "liber" as "chedys". Unless you remap "qo" and "ch" to a single glyph somewhere downstream, I assume the ciphertext should be longer than the plaintext? To put it differently, when you compute the entropy of the encoded text, do you count "qo" and "ch" as a single glyph or as a pair of glyphs?
(17-09-2025, 02:32 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Computing entropy with residuals doesn’t really apply (under my opinion). Residuals aren’t part of the ciphertext; they’re the little side notes that tell you which original token a shared glyph stood for.
I'm not sure I understand the reasoning here. From my point of view computing the entropy with the residual is a perfect metric to understand exactly how much entropy is offloaded to the residuals.
(17-09-2025, 03:09 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.In your first post "ber" was encoded as "qoak" and "liber" as "chedys". Unless you remap "qo" and "ch" to a single glyph somewhere downstream, I assume the ciphertext should be longer than the plaintext? To put it differently, when you compute the entropy of the encoded text, do you count "qo" and "ch" as a single glyph or as a pair of glyphs?
You are right, sorry. During these days I have been experimenting with single letter cipher and leter/bigram ciphers (the first version included bigrams). I found that ciphering bigrams result in a very complex conversion table for the scriba, and I prefered to work on single letters. The code I shared is ready to set to work with bigrams, if you wish, so you can easily see examples and differences. If you cannot run the code, I can run it for you and post the results. Just tell me.
(17-09-2025, 03:09 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand the reasoning here. From my point of view computing the entropy with the residual is a perfect metric to understand exactly how much entropy is offloaded to the residuals.
OK, I see your intentions. In this point of view, yes, you are right. I will try to calculate the entropy with the residuals, so we can see the numbers. I guess it should be the difference between the original entropy vs. the ciphered entropy, but let's check.
(17-09-2025, 01:39 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Is it possible to compute the entropy with the residuals? Just to have a better understanding.
OK, here are the entropies and helpful calculations for Culpepper ciphered with PM_cipher in single character mode (no bigrams):
Symbols at ambiguous slots (bucket>1): 164,731 / 169,710 (97.07%): positions where the glyph’s bucket has more than one possible token (ambiguity exists), regardless of which residual was chosen.
Symbols that actually required residual (residual>0): 96,425 / 169,710 (56.82%): cases where the default index (0) was not used, so a side note had to be written to disambiguate.
Residual bits/symbol E[H(R | pos,g)]: 1.762 expected residual info per glyph if you always transmit the residual index (including 0).
=== Glyph entropies (bits/symbol): word-reset model ===
H1(G) = 3.387
H2(G) = 2.326
H3(G) = 2.002
H4(G) = 1.603
Approximate joint bits/symbol (glyph + residual side info):
H1(G) + residuals ≈ 5.149
H2(G) + residuals ≈ 4.088
H3(G) + residuals ≈ 3.765
H4(G) + residuals ≈ 3.365
(I am not sure if this can be calculated so directly with a simple addition. I think entropy does not have this property. I mean, we calculate entropy of the ciphered text (chains of characters) and entropy of the residuals (chains of numbers) and I am not sure we should make the addition.)
I treated residuals as a side channel depending only on (position, glyph), so their expected cost is a constant add-on to every Hn(G).