Positional Mimic Cipher (PM-Cipher)

Positional Mimic Cipher (PM-Cipher) - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Positional Mimic Cipher (PM-Cipher) (/thread-4921.html)

Pages: 1 2 3 4 5

Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

Dear Voynich ninjas,

I propose here a single-mode positional substitution cipher that

sharply reduces entropy at n=2
decays only mildly for n>2, matching a key distributional signature of Voynich text (see You are not allowed to view links. Register or Login to view.)
preserves original language token lengths
works programatically on any kind of language
the method is quill-and-paper friendly, so could have been done in the XV century (or even before)
... but can't still decipher the text, not yet...

It consists of just a position-by-position substitution table, plus a tiny ‘residuals’ note to resolve cases where one cipher glyph covers more than one letter (for example, ‘ch’ at position 1, see table below).

I call it Positional Mimic Cipher (PM-Cipher). It is a single-mode, position-by-position substitution using the Voynich EVA alphabet, with small per-position priority lists (‘residuals’) tweaked to imitate Voynich-style statistics. In practice, on a natural language sample the cipher yields typical figures around r≈0.90 for graphemes, r≈0.50 for in-word bigrams, H₂≈2.80, and length/Zipf shapes close to VMS B* while keeping the median token length at 4-5.

My design goals were, as said, to have a strong n=2 drop in entropy and mild additional drop for n>2. As Voynich token lengths look like natural language, I thought it would be interesting to keep the natural language token length; and it does. It should mimic overall grapheme/bigram shape (correlations (How similarly do two frequency patterns rise and fall?); Jensen-Shannon divergence (How far apart are two probability distributions?) and Zipf curve (Similar slope and curvature). Oh, and thelast one was a must! Fully doable with period materials: just paper and something to write on it.

How to encrypt (quill & paper):

Take the plaintext as Latin letters (any language).
For the k-th letter in a word, look up its Voynich substitution in the table’s column p[k] (see table below)
Write the EVA grapheme you see there (sometimes multi-grapheme, e.g., ch).
If a table cell produces a multi-grapheme EVA that can come from different Latin letters, check a tiny “residuals” note for that position to pick the intended original (this only matters when decoding; for encoding you simply follow the table).

How to decrypt:

Read each EVA grapheme by position.
Use the same table, but in reverse (per position).
If an EVA entry maps to several Latin letters, consult the residuals note for that position to resolve it deterministically.

Why it matches Voynich-like stats? Positionality lets us emulate within-word structure (beginnings/ends differ) without ballooning word length. A light “residual” re-ordering step nudges unigram/bigram mass (e.g., boosting ey, ai, ii, ke and damping de, el, ry, da) while preserving token length and Zipf shape. This gives us a net effect: large entropy drop at n=2 but shallow slope for n>2, close to VMS behavior.

Materials & practicality:

One printed table (below).
A small residuals slip (one or two pages): per position, a short priority list where EVA⇄Latin is many-to-one (e.g., for ch at position 1). (That “residuals booklet” could easily have existed and been lost; it’s small and personal.) Interesting note: the residual could be hidden even in the MS text itself (a small point, a small symbol)... but still searching for it.

Core positional table example (from De Docta Ignorantia - Nicolaus von Kues):
[Image: xiRmh2y.png]

Examples

latin: Auferre, trucidare, rapere falsis nominibus res publica, atque ubi solitudinem faciunt, pacem appellant
PM-Ciphered: aeryssi qotedisemy larysi qoallid shoderanyr lal shedlile akkee aka choledylidyl qoacheyrs shachyl arryndenn
Residuals: 0101220 311000030 102121 100101 002010011 111 1100000 03110 100 20003101113 1000112 10013 012100024

Some plots:
[Image: NvS8Atv.png]

[size=1][font='Proxima Nova Regular', 'Helvetica Neue', Helvetica, Arial, sans-serif] [Image: pTVAtAY.png]

[/font][/size]
Note: most of these plots depend on the original text in natural language. For example, the distribution of token lengths (as the PM-Cipher leaves the token lengths almost as it is originally), the Zipf curves... But in the following plots you can see how entropy behaves as the MS does (and perplexity, where we can see the bump even better):

[Image: XD6FUOd.png]

Limitations, advantages & next steps

Residuals booklet is required for strict decoding where EVA is multi-grapheme; historically plausible but must be posited.
Possible to hide the residuals in small marks on the glyphs (or even have different glyphs) -> In study
Cross-text validation across languages: the code adapts the cipher table and residuals to whatever language as input. The goal would be to have as many used residuals as possible, playing with different languages
As said: it does not decipher the MS, but it is an easy-to-do cipher that fulfils the requirements (entropy, dimension of tokens, etc)
I am aware that the first lines of the paragraphs are different from the rest of the text. We could make a cipher table only for those lines and make appear more gallows.

Happy to share it with you and discuss tests or alternative target profiles. If you think it is interesting enough, I would write and present a paper about it.

Feedback very welcome!

RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 10-09-2025

Looks cool, but it doesn't respect the CLS rules, as far as I can see?

RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

(10-09-2025, 07:16 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Looks cool, but it doesn't respect the CLS rules, as far as I can see?

No, because of the natural language tested. The code tries to adapt the given natural language to the Voynich rules, and the result is what it is... In this case, De docta ignorantia is obviously not the text behind the Voynich.. and quite surely latin is not the language, or at least, the latin used in De docta ignorantia. Then, the CLS are not fulfilled.

RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 10-09-2025

How does it work when encoding long texts, do you continue through the table or reset to column 1 at each word break?

RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

(10-09-2025, 07:25 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.How does it work when encoding long texts, do you continue through the table or reset to column 1 at each word break?

Reset at each word.

RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

(10-09-2025, 07:16 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Looks cool, but it doesn't respect the CLS rules, as far as I can see?

Let me explain a bit more: we know, for example, that "qo" has a significant % of starting positions. If the natural language text (the input) has the same % of startin positions with, let's say, "t" and "m", "qo" will be assigned to "t" and "m" at the first position. In that case, we would find the same proportion of "qo" as the MS has. But De docta ignorantia has much more dispersion starting the words, so we don't see as many "qo" as the MS.

RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 10-09-2025

(10-09-2025, 07:29 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Reset at each word.

This means that whenever a sequence of words is repeated in the text, the cipher will produce a repeated sequence in the ciphertext, while long repeated sequences are relatively rare in the Voynich MS. Also, this would mean that identical labels will have identical encoding and whenever a label appears in the text it should repeat the label verbatim, is this correct?

RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

Let me explain: in the classical substitution ciphers one gliph = one original letter.

In the PM_cipher:

- one original leter can be represented by different glyphs, deppending on the position in the word
- AND in a word position, one gliph can correspond to multiple letters (that's why we need the residuals.

So, two ciphered words that look the same (two quokedy) don't need to correspond to the same original word, because the "qu" might be the cipher of two (or thre or four) different letters.

RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 10-09-2025

(10-09-2025, 08:16 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Let me explain: in the classical substitution ciphers one gliph = one original letter.

In the PM_cipher:

- one original leter can be represented by different glyphs, deppending on the position in the word
- AND in a word position, one gliph can correspond to multiple letters (that's why we need the residuals.

So, two ciphered words that look the same (two quokedy) don't need to correspond to the same original word, because the "qu" might be the cipher of two (or thre or four) different letters.

This makes it even slightly worse from the point of view of repeated sequences, some different plaintext sequences can map to the same ciphertext too. Unlikely to cause much difference, but the main problem remains. Each repeating phrase (like "In this section will will study the properties of") will produce the same ciphertext.

RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 10-09-2025

Well, I understand what you say. It is true that long sentences would be ciphered the same way, and that Voynich does not have long sentences repeated.
There is one thing in the equation to test. It is also known that the first initial word of each sentence is restricted to a couple of glyphs. If those glyps, for example, were keys to change the residuals order, we would also have different ciphers for the same word...