Dear Voynich ninjas,
I propose here a single-mode positional substitution cipher that
- sharply reduces entropy at n=2
- decays only mildly for n>2, matching a key distributional signature of Voynich text (see You are not allowed to view links. Register or Login to view.)
- preserves original language token lengths
- works programatically on any kind of language
- the method is quill-and-paper friendly, so could have been done in the XV century (or even before)
- ... but can't still decipher the text, not yet...
It consists of just a position-by-position substitution table, plus a tiny ‘residuals’ note to resolve cases where one cipher glyph covers more than one letter (for example, ‘ch’ at position 1, see table below).
I call it
Positional Mimic Cipher (PM-Cipher). It is a single-mode, position-by-position substitution using the Voynich EVA alphabet, with small per-position priority lists (‘residuals’) tweaked to imitate Voynich-style statistics. In practice, on a natural language sample the cipher yields typical figures around
r≈0.90 for graphemes,
r≈0.50 for in-word bigrams,
H₂≈2.80, and length/Zipf shapes close to VMS B* while keeping the median token length at 4-5.
My design goals were, as said, to have a strong n=2 drop in entropy and mild additional drop for n>2. As Voynich token lengths look like natural language, I thought it would be interesting to keep the natural language token length; and it does. It should mimic overall grapheme/bigram shape (correlations (How similarly do two frequency patterns rise and fall?); Jensen-Shannon divergence (How far apart are two probability distributions?) and Zipf curve (Similar slope and curvature). Oh, and thelast one was a must! Fully doable with period materials: just paper and something to write on it.
How to encrypt (quill & paper):
- Take the plaintext as Latin letters (any language).
- For the k-th letter in a word, look up its Voynich substitution in the table’s column p[k] (see table below)
- Write the EVA grapheme you see there (sometimes multi-grapheme, e.g., ch).
- If a table cell produces a multi-grapheme EVA that can come from different Latin letters, check a tiny “residuals” note for that position to pick the intended original (this only matters when decoding; for encoding you simply follow the table).
How to decrypt:
- Read each EVA grapheme by position.
- Use the same table, but in reverse (per position).
- If an EVA entry maps to several Latin letters, consult the residuals note for that position to resolve it deterministically.
Why it matches Voynich-like stats? Positionality lets us emulate within-word structure (beginnings/ends differ) without ballooning word length. A light “residual” re-ordering step nudges unigram/bigram mass (e.g., boosting ey, ai, ii, ke and damping de, el, ry, da) while preserving token length and Zipf shape. This gives us a net effect: large entropy drop at n=2 but shallow slope for n>2, close to VMS behavior.
Materials & practicality:
- One printed table (below).
- A small residuals slip (one or two pages): per position, a short priority list where EVA⇄Latin is many-to-one (e.g., for ch at position 1). (That “residuals booklet” could easily have existed and been lost; it’s small and personal.) Interesting note: the residual could be hidden even in the MS text itself (a small point, a small symbol)... but still searching for it.
Core positional table example (from De Docta Ignorantia - Nicolaus von Kues):
Examples
latin: Auferre, trucidare, rapere falsis nominibus res publica, atque ubi solitudinem faciunt, pacem appellant
PM-Ciphered: aeryssi qotedisemy larysi qoallid shoderanyr lal shedlile akkee aka choledylidyl qoacheyrs shachyl arryndenn
Residuals: 0101220 311000030 102121 100101 002010011 111 1100000 03110 100 20003101113 1000112 10013 012100024
Some plots:
[size=1][font='Proxima Nova Regular', 'Helvetica Neue', Helvetica, Arial, sans-serif]
[/font][/size]
Note: most of these plots depend on the original text in natural language. For example, the distribution of token lengths (as the PM-Cipher leaves the token lengths almost as it is originally), the Zipf curves... But in the following plots you can see how entropy behaves as the MS does (and perplexity, where we can see the bump even better):
Limitations, advantages & next steps
- Residuals booklet is required for strict decoding where EVA is multi-grapheme; historically plausible but must be posited.
- Possible to hide the residuals in small marks on the glyphs (or even have different glyphs) -> In study
- Cross-text validation across languages: the code adapts the cipher table and residuals to whatever language as input. The goal would be to have as many used residuals as possible, playing with different languages
- As said: it does not decipher the MS, but it is an easy-to-do cipher that fulfils the requirements (entropy, dimension of tokens, etc)
- I am aware that the first lines of the paragraphs are different from the rest of the text. We could make a cipher table only for those lines and make appear more gallows.
Happy to share it with you and discuss tests or alternative target profiles. If you think it is interesting enough, I would write and present a paper about it.
Feedback very welcome!