The Voynich Ninja
Positional Mimic Cipher (PM-Cipher) - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Positional Mimic Cipher (PM-Cipher) (/thread-4921.html)

Pages: 1 2 3 4


RE: Positional Mimic Cipher (PM-Cipher) - ReneZ - 11-09-2025

On a more detailed note, keep in mind that the second order entropy (either the unconditional or the conditioal one) is a single value representing a 2-dimensional matrix of probability values.
There are infinitely many different such matrices (probability distributions) that all lead to the same entropy.

To get a really good match, one should be able to approach the entire distribution.
Note, that this would be a major achievement, and a sign that one is potentially onto something.

(Of course, a Naibbe-type cipher, that recomposes the text from known segments of it, may be able to mimick it quite well due to its very nature.)


RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 11-09-2025

(11-09-2025, 08:19 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The results I ppsted to René comment about entropy are the results of decoding the full De docta ignorantia text.

Yes, I have the code. Need to clean it a bit, and publish. I would like to publish a paper about this together wirh the code, but firstly I wanted some feedback from the ninja comunity if it is really interesting or not.

As far as I understand, the lowering of entropy happens because of three factors: 1) some of the uncertainty is off-loaded to residuals; 2) some characters are encoded via ciphertext bigrams, so the cipher is a bit verbose; 3) for short texts some of the entropy is captured by the structure of the table.

While 1 and 2 are easy to explain and would work for text of any size, 3 should not really scale with the size of the text.

This one looks interesting to me personally, because in principle it's realistic for a book. Residuals can be encoded via some character traits (like curvy/pointy l's, etc) and overall you don't even need a table this huge, something like 8-9 columns should be more than enough. To make it mimic more of the properties of Voynichese it should probably be a one-to-many cipher, but maybe instead of resetting the table at each word break the encoding can just keep rolling, only resetting when reaching a terminal character in the table (say, y / n).

Overall, I suspect this cipher primarily keeps the entropy low by expanding the effective character set via residuals. Nothing wrong with that, but essentially this is the same old idea that the actual alphabet of Voynichese is not ~15 common characters, but ~30-40 common characters with two or more distinct l's, two or more distinct r's, etc.


RE: Positional Mimic Cipher (PM-Cipher) - nablator - 11-09-2025

(11-09-2025, 08:10 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Yes, the tricky thing is to adapt it to voynich stile and try to keep the residuals as low as possible.

If you manage to make the ciphertext Voynichese-like (which requires more constraints, a low conditional character entropy and weak positional rules such as forcing n at position 6+ are not good enough) you will find out that your "residuals" contain as much information (or more) as the ciphertext, making it difficult to sweep them under the carpet.


RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 11-09-2025

(11-09-2025, 08:28 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.To get a really good match, one should be able to approach the entire distribution.
Note, that this would be a major achievement, and a sign that one is potentially onto something.

Ready to try it! But maybe first I should get a text that has a similar distribution of word lengrhs. De docta ignorantia has quite a lot of word longer than Voynich, and I think this might afrect the entropy.

You can see here at the left plot how different they are:
[size=1][font='Proxima Nova Regular', 'Helvetica Neue', Helvetica, Arial, sans-serif][Image: pTVAtAY.png][/font][/size]

Any idea of a text with these characteristics? Anyone can help?


RE: Positional Mimic Cipher (PM-Cipher) - Jorge_Stolfi - 11-09-2025

(11-09-2025, 04:30 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Ready to try it! But maybe first I should get a text that has a similar distribution of word lengrhs.

Would You are not allowed to view links. Register or Login to view. do?  

It is the Vietnamese translation of the Pentateuch.  The file main.wds has one word per line.  Lines that start with "a " are words; lines that start with "p " are punctuation, and in particular "p =" is the end of a parag.  Other lines may be ignored.  The files are described in more detail in the "#"-comments at the beginning.  In particular:

# #  This file uses a standard Vietnamese encoding in ASCII, VIQR (see
# #  table at end of this file) -- except that some accents were
# #  remapped to avoid confusion with puctuation ("?"->"@{ß}", "("->"@{µ}",
# #  "."->"@{°}"). See the table at the end of this file and the script
# #  "fix-encoding".

The encoding is described in detail in the file "You are not allowed to view links. Register or Login to view.".   It was the standard way to write e-mail in Vietnamese before Unicode. There are a few hyphenated two-syllable compounds, but most words should be one syllable long.

I also have samples in Mandarin Chinese, but they are in the logographic script, so they are useless for character-level statistics.  One should use a romanized transcription like pinyin instead.  The files are also in an old pre-Unicode encoding (GB) and conversion to Unicode will not be exactly trivial.

All the best, --jorge


RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 11-09-2025

(11-09-2025, 08:42 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Nothing wrong with that, but essentially this is the same old idea that the actual alphabet of Voynichese is not ~15 common characters, but ~30-40 common characters with two or more distinct l's, two or more distinct r's, etc.

These slight gliph differences could be perfectly the residuals marks, so, no need of a suplementary booklet with the residuals.


RE: Positional Mimic Cipher (PM-Cipher) - oshfdk - 11-09-2025

(11-09-2025, 08:27 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.These slight gliph differences could be perfectly the residuals marks, so, no need of a suplementary booklet with the residuals.

Yes. The main issue then is that the table is not really needed for this method to work, and we are back in the realm of expanded alphabet/microwriting solutions: starting with Newbold's, and including at least half a dozen proposals from some active participants on this forum.

Edit: I'm not saying the underlying idea is wrong. I'm just saying there hasn't been a lot of progress, even though theories along these lines have been in existence for over 100 years.


RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 12-09-2025

(11-09-2025, 08:28 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.On a more detailed note, keep in mind that the second order entropy (either the unconditional or the conditioal one) is a single value representing a 2-dimensional matrix of probability values.
There are infinitely many different such matrices (probability distributions) that all lead to the same entropy.

To get a really good match, one should be able to approach the entire distribution.
Note, that this would be a major achievement, and a sign that one is potentially onto something.

(Of course, a Naibbe-type cipher, that recomposes the text from known segments of it, may be able to mimick it quite well due to its very nature.)

Hi René,

Following your point about second-order entropy being just a single scalar, I’ve been comparing full bigram distributions directly. For each text I build the 2D bigram probability matrix (word-reset, normalized), flatten it, and compute the Jensen–Shannon divergence (JSD) against the Voynich matrix. (For those who don't know about it, JSD is 0 when two distributions are identical and approaches 1 as they diverge; it’s not a “percent similarity,” but it’s a clean, symmetric way to quantify how close the entire matrix is). I also plot the corresponding heatmaps so the structure of transitions is visible rather than compressed into one number.

Here is a table with the results:
TextJSD
Vietnamese with special chars0.504
Vietnamese without special chars0.362
Romeo and Juliet0.299
De docta Ignorantia0.331
In Psalmum David CXVIII Expositio0.316
Lazarillo de Tormes0.340
Triant lo Blanch0.317

My optimization work tries to minimize JSD while steering H₂ toward the Voynich’s level (≈2.1). The goal is not just to match a single entropy value but to approach the full bigram distribution. I’m attaching some heatmaps of the lowest JSD cases and Voynich so you can see the contrasts.

[Image: 9syL2Rt.png]

[Image: e9jxKXF.png]

[Image: 9Oyifyz.png]

[Image: cJktZ3L.png]

[Image: 2B56qmW.png]
[Image: K0x8bzf.png]
It is obvious that I need to find a better suit of text, in terms of language and writting (a scientific text may have better JSD than a romance text in the same language).

If you think it is of interest and anyone want to check any other text, please let me know. I am aware that old slave languages may have good patterns, but I have no text. I could make a loop to check different texts. If they can come in lower and in a single string with a dot between the words, that would be perfect. If not, it will take me longer...


RE: Positional Mimic Cipher (PM-Cipher) - Jorge_Stolfi - 12-09-2025

(12-09-2025, 10:45 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Vietnamese without special chars

In case you haven't noticed, those "special chars" are modifiers for the letters.  It makes no sense to exclude them.

Quote:I’ve been comparing full bigram distributions directly. For each text I build the 2D bigram probability matrix (word-reset, normalized), flatten it, and compute the Jensen–Shannon divergence (JSD) against the Voynich matrix.

First, this exercise is meaningless if the two texts use digraphs and trigraphs to encode single phonemes, or discard some phonetic information,  in different ways -  as is the case for all languages tested, including Voynichese.  

Second, IIUC, the JSD assumes that the two distributions P,Q are defined on the same set of elements, and compares the probabilities assigned by P and Q on each element.  Thus if P is the letter pair distribution of a text, and Q is the letter pair distribution of the same text encoded with a Caesar cipher, the JST should come out huge.  Do you take this into account, and try to vary the mapping between letters of the two languages?


RE: Positional Mimic Cipher (PM-Cipher) - quimqu - 12-09-2025

Thank you, Jorge.
  
(12-09-2025, 12:56 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Second, IIUC, the JSD assumes that the two distributions P,Q are defined on the same set of elements, and compares the probabilities assigned by P and Q on each element.  Thus if P is the letter pair distribution of a text, and Q is the letter pair distribution of the same text encoded with a Caesar cipher, the JST should come out huge.  Do you take this into account, and try to vary the mapping between letters of the two languages?

Just to clarify: I’m not comparing raw Vietnamese (or natural languages) against Voynich with JSD. All my JSD results for the cipher are EVA vs EVA. The cipher outputs EVA glyphs, and the Voynich reference is in EVA too, so both distributions are defined on the exact same alphabet. I think it is the right way to do it, but if this is not the right way, please feel free to tell me...

(12-09-2025, 12:56 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In case you haven't noticed, those "special chars" are modifiers for the letters.  It makes no sense to exclude them.

Yes, I noticed. I think I need to reconstruct the modifiers to unicode or similar, so I get the letter+modifier in a single char, and re-run. Do you think this would help? Some modifiers are separated chars, like here: "O^zya". If I work with unicode, "O^" would be a single char. One question, can I lowerize the capitalized chars, or they are really different (eg. O vs. o)?

Thanks again!!!