The Voynich Ninja

Full Version: The Naibbe cipher
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8
(14-08-2025, 12:25 AM)davidma Wrote: You are not allowed to view links. Register or Login to view.As i recall from your post quimqu no other cipher had that distinctive bump that the VM has, but the naibbe one does. 

No, syllabic substitution made the trick of the bump also.
Addendum to You are not allowed to view links. Register or Login to view.:

You can also create a heat map in which the labels are displayed in Voynichese. This requires that the font (usually “eva1.ttf”) be installed on your PC. In the following line in the Python script, the path to the font must be adjusted (line 59):

custom_font_path = "/home/me/.local/share/fonts/eva1.ttf"


The call is made as already described, where 25 is the variable number of listed pairs:

python heatmap_prefix_suffix.py parsed.txt 25


Code:
#!/usr/bin/env python3
import sys
from collections import Counter
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import font_manager as fm

def extract_prefix_suffix(word):
    parts = word.split('/')
    if len(parts) < 3:
        return [], [], None
    stem_index = len(parts) // 2
    prefix = parts[:stem_index]
    suffix = parts[stem_index+1:]
    return prefix, suffix, parts[stem_index]

def main():
    if len(sys.argv) < 2:
        print("Usage: python heatmap_prefix_suffix.py output_segmented.txt [N]")
        sys.exit(1)

    filename = sys.argv[1]
    top_n = 20  # default number of top prefixes and suffixes
    if len(sys.argv) >= 3:
        try:
            top_n = int(sys.argv[2])
        except ValueError:
            print("Parameter N must be an integer.")
            sys.exit(1)

    prefix_suffix_counts = Counter()

    with open(filename, 'r', encoding='utf-8') as f:
        for line in f:
            words = line.strip().split()
            for w in words:
                prefix, suffix, stem = extract_prefix_suffix(w)
                for pre in prefix:
                    for suf in suffix:
                        prefix_suffix_counts[(pre, suf)] += 1

    if not prefix_suffix_counts:
        print("No prefix-suffix combinations found.")
        sys.exit(1)

    data = [{"prefix": pre, "suffix": suf, "count": count}
            for (pre, suf), count in prefix_suffix_counts.items()]
    df = pd.DataFrame(data)

    # Select top N prefixes and suffixes by total count
    top_prefixes = df.groupby('prefix')['count'].sum().nlargest(top_n).index
    top_suffixes = df.groupby('suffix')['count'].sum().nlargest(top_n).index

    pivot = df.pivot(index='prefix', columns='suffix', values='count').fillna(0)
    pivot_top = pivot.loc[top_prefixes, top_suffixes]

    # Lade lokalen Font
    custom_font_path = "/home/me/.local/share/fonts/eva1.ttf"
    custom_font = fm.FontProperties(fname=custom_font_path, size=12)

    # Plot
    plt.figure(figsize=(18, 12))
    ax = sns.heatmap(
        pivot_top,
        annot=True,
        fmt=".0f",
        cmap="YlGnBu",
        cbar_kws={"shrink": 0.5}
    )

    plt.title(f"Top {top_n} Prefix-Suffix Combinations")
    plt.xlabel("Suffix")
    plt.ylabel("Prefix")

    # Achsenbeschriftungen (Tick-Labels) mit lokalem Font
    ax.set_xticklabels(ax.get_xticklabels(), fontproperties=custom_font, rotation=45, ha='right')
    ax.set_yticklabels(ax.get_yticklabels(), fontproperties=custom_font, rotation=0)

    plt.subplots_adjust(bottom=0.2)
    plt.show()

if __name__ == "__main__":
    main()

[attachment=11259]
[attachment=11260]

Here are the text files for parsing:
[attachment=11265]
[attachment=11266]
You are not allowed to view links. Register or Login to view.

qokchy tched qokeody alar qokaiin shedy qolcheody shg ain

GRATIASTIBIAGO
(05-09-2025, 07:52 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.

qokchy tched qokeody alar qokaiin shedy qolcheody shg ain

GRATIASTIBIAGO

We have a winner!
About the scripts posted by Magnesium in the other thread: You are not allowed to view links. Register or Login to view.

I tried naibbe.py and decrypt_naibbe.py on a Latin text (1000 short lines) with default settings: no problem. Yes

Despite this setting in naibbe.py:
UNAMBIGUOUS=True #True means bigram token generation avoids accidentally creating unigram word types. Strongly recommend True

I had an ambiguity at decryption every few lines, for example: ep(ma|it)aphium

20250724 Naibbe Cipher Paper Wrote:2.7 Minimizing ambiguities during encryption and decryption
We can also preemptively eliminate ambiguity at the encryption stage. If we happen to encrypt a plaintext bigram as a word type reserved for unigram use, we can simply re-encrypt the bigram (by redrawing cards) so that it no longer matches a unigram word type.

naibbe.py Wrote:Total ambiguity retries due to prefix+suffix collisions with unigrams: 798

No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?

After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.
(05-09-2025, 10:08 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I tried naibbe.py and decrypt_naibbe.py on a Latin text (1000 short lines) with default settings: no problem. Yes

Despite this setting in naibbe.py:
UNAMBIGUOUS=True #True means bigram token generation avoids accidentally creating unigram word types. Strongly recommend True

I had an ambiguity at decryption every few lines, for example: ep(ma|it)aphium

20250724 Naibbe Cipher Paper Wrote:2.7 Minimizing ambiguities during encryption and decryption
We can also preemptively eliminate ambiguity at the encryption stage. If we happen to encrypt a plaintext bigram as a word type reserved for unigram use, we can simply re-encrypt the bigram (by redrawing cards) so that it no longer matches a unigram word type.

naibbe.py Wrote:Total ambiguity retries due to prefix+suffix collisions with unigrams: 798

No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?

After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.

In principle yes, there should be a loop that checks bigram-bigram ambiguity. I haven’t gotten there yet; I am working on it. As an interim solution I included the ambiguous decryption formatting under the UNAMBIGUOUS condition so that bigram-bigram ambiguity is at least visible.
(05-09-2025, 10:08 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.About the scripts posted by Magnesium in the other thread: You are not allowed to view links. Register or Login to view.

No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?

After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.

I have added "naibbe_v2.py" to the Dropbox folder, along with an associated Jupyter notebook. Its version of the unambiguous loop should take care of bigram-bigram ambiguity without too much drag on performance.
Pages: 1 2 3 4 5 6 7 8