14-08-2025, 10:17 AM
14-08-2025, 10:17 AM
14-08-2025, 04:47 PM
Addendum to You are not allowed to view links. Register or Login to view.:
You can also create a heat map in which the labels are displayed in Voynichese. This requires that the font (usually “eva1.ttf”) be installed on your PC. In the following line in the Python script, the path to the font must be adjusted (line 59):
custom_font_path = "/home/me/.local/share/fonts/eva1.ttf"
The call is made as already described, where 25 is the variable number of listed pairs:
python heatmap_prefix_suffix.py parsed.txt 25
[attachment=11259]
[attachment=11260]
Here are the text files for parsing:
[attachment=11265]
[attachment=11266]
You can also create a heat map in which the labels are displayed in Voynichese. This requires that the font (usually “eva1.ttf”) be installed on your PC. In the following line in the Python script, the path to the font must be adjusted (line 59):
custom_font_path = "/home/me/.local/share/fonts/eva1.ttf"
The call is made as already described, where 25 is the variable number of listed pairs:
python heatmap_prefix_suffix.py parsed.txt 25
Code:
#!/usr/bin/env python3
import sys
from collections import Counter
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import font_manager as fm
def extract_prefix_suffix(word):
parts = word.split('/')
if len(parts) < 3:
return [], [], None
stem_index = len(parts) // 2
prefix = parts[:stem_index]
suffix = parts[stem_index+1:]
return prefix, suffix, parts[stem_index]
def main():
if len(sys.argv) < 2:
print("Usage: python heatmap_prefix_suffix.py output_segmented.txt [N]")
sys.exit(1)
filename = sys.argv[1]
top_n = 20 # default number of top prefixes and suffixes
if len(sys.argv) >= 3:
try:
top_n = int(sys.argv[2])
except ValueError:
print("Parameter N must be an integer.")
sys.exit(1)
prefix_suffix_counts = Counter()
with open(filename, 'r', encoding='utf-8') as f:
for line in f:
words = line.strip().split()
for w in words:
prefix, suffix, stem = extract_prefix_suffix(w)
for pre in prefix:
for suf in suffix:
prefix_suffix_counts[(pre, suf)] += 1
if not prefix_suffix_counts:
print("No prefix-suffix combinations found.")
sys.exit(1)
data = [{"prefix": pre, "suffix": suf, "count": count}
for (pre, suf), count in prefix_suffix_counts.items()]
df = pd.DataFrame(data)
# Select top N prefixes and suffixes by total count
top_prefixes = df.groupby('prefix')['count'].sum().nlargest(top_n).index
top_suffixes = df.groupby('suffix')['count'].sum().nlargest(top_n).index
pivot = df.pivot(index='prefix', columns='suffix', values='count').fillna(0)
pivot_top = pivot.loc[top_prefixes, top_suffixes]
# Lade lokalen Font
custom_font_path = "/home/me/.local/share/fonts/eva1.ttf"
custom_font = fm.FontProperties(fname=custom_font_path, size=12)
# Plot
plt.figure(figsize=(18, 12))
ax = sns.heatmap(
pivot_top,
annot=True,
fmt=".0f",
cmap="YlGnBu",
cbar_kws={"shrink": 0.5}
)
plt.title(f"Top {top_n} Prefix-Suffix Combinations")
plt.xlabel("Suffix")
plt.ylabel("Prefix")
# Achsenbeschriftungen (Tick-Labels) mit lokalem Font
ax.set_xticklabels(ax.get_xticklabels(), fontproperties=custom_font, rotation=45, ha='right')
ax.set_yticklabels(ax.get_yticklabels(), fontproperties=custom_font, rotation=0)
plt.subplots_adjust(bottom=0.2)
plt.show()
if __name__ == "__main__":
main()
[attachment=11259]
[attachment=11260]
Here are the text files for parsing:
[attachment=11265]
[attachment=11266]
05-09-2025, 07:52 PM
05-09-2025, 07:57 PM
05-09-2025, 10:08 PM
About the scripts posted by Magnesium in the other thread: You are not allowed to view links. Register or Login to view.
I tried naibbe.py and decrypt_naibbe.py on a Latin text (1000 short lines) with default settings: no problem.
Despite this setting in naibbe.py:
UNAMBIGUOUS=True #True means bigram token generation avoids accidentally creating unigram word types. Strongly recommend True
I had an ambiguity at decryption every few lines, for example: ep(ma|it)aphium
No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?
After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.
I tried naibbe.py and decrypt_naibbe.py on a Latin text (1000 short lines) with default settings: no problem.

Despite this setting in naibbe.py:
UNAMBIGUOUS=True #True means bigram token generation avoids accidentally creating unigram word types. Strongly recommend True
I had an ambiguity at decryption every few lines, for example: ep(ma|it)aphium
20250724 Naibbe Cipher Paper Wrote:2.7 Minimizing ambiguities during encryption and decryption
We can also preemptively eliminate ambiguity at the encryption stage. If we happen to encrypt a plaintext bigram as a word type reserved for unigram use, we can simply re-encrypt the bigram (by redrawing cards) so that it no longer matches a unigram word type.
naibbe.py Wrote:Total ambiguity retries due to prefix+suffix collisions with unigrams: 798
No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?
After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.
05-09-2025, 10:52 PM
(05-09-2025, 10:08 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I tried naibbe.py and decrypt_naibbe.py on a Latin text (1000 short lines) with default settings: no problem.
Despite this setting in naibbe.py:
UNAMBIGUOUS=True #True means bigram token generation avoids accidentally creating unigram word types. Strongly recommend True
I had an ambiguity at decryption every few lines, for example: ep(ma|it)aphium
20250724 Naibbe Cipher Paper Wrote:2.7 Minimizing ambiguities during encryption and decryption
We can also preemptively eliminate ambiguity at the encryption stage. If we happen to encrypt a plaintext bigram as a word type reserved for unigram use, we can simply re-encrypt the bigram (by redrawing cards) so that it no longer matches a unigram word type.
naibbe.py Wrote:Total ambiguity retries due to prefix+suffix collisions with unigrams: 798
No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?
After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.
In principle yes, there should be a loop that checks bigram-bigram ambiguity. I haven’t gotten there yet; I am working on it. As an interim solution I included the ambiguous decryption formatting under the UNAMBIGUOUS condition so that bigram-bigram ambiguity is at least visible.
06-09-2025, 01:33 AM
(05-09-2025, 10:08 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.About the scripts posted by Magnesium in the other thread: You are not allowed to view links. Register or Login to view.
No issue, but I'm just wondering: if these retries do not eliminate ambiguities completely, should there be more retries?
After 4 attempts I got a non-ambiguous ciphertext for "epitaphium", so it is possible: chedy dchdy qokaiin yte shy qokedy sheedy qokar.
I have added "naibbe_v2.py" to the Dropbox folder, along with an associated Jupyter notebook. Its version of the unambiguous loop should take care of bigram-bigram ambiguity without too much drag on performance.