bi3mw > 23-11-2025, 11:30 AM
#!/usr/bin/env python3
# coding: utf-8
"""
DoubleMetaphone + Levenshtein Hybrid Decoder
--------------------------------------------------------
1. Load Latin words from wordlist.txt
2. Build a DoubleMetaphone index
3. For each Voynich word:
- Compute its DoubleMetaphone codes
- Find matching or phonetically similar Latin words
- Rank candidates using Levenshtein distance
- Output the best candidate (or <no match>)
"""
import sys
from metaphone import doublemetaphone
from collections import defaultdict
from tqdm import tqdm
# ---------------------------------------------------------
# Levenshtein distance
# ---------------------------------------------------------
def levenshtein(a, b):
if not a:
return len(b)
if not b:
return len(a)
dp = range(len(b) + 1)
for i, ca in enumerate(a, 1):
new_dp = [i]
for j, cb in enumerate(b, 1):
if ca == cb:
new_dp.append(dp[j-1])
else:
new_dp.append(1 + min(dp[j-1], dp[j], new_dp[-1]))
dp = new_dp
return dp[-1]
# ---------------------------------------------------------
# Load Latin words
# ---------------------------------------------------------
def load_wordlist(path="wordlist.txt"):
words = []
with open(path, "r", encoding="utf8") as f:
for line in f:
word = line.strip().lower()
if word:
words.append(word)
return words
# ---------------------------------------------------------
# Normalize a Voynich word (remove non-letters)
# ---------------------------------------------------------
def normalize_voynich_word(w):
return "".join(c.lower() for c in w if c.isalpha())
# ---------------------------------------------------------
# Build Metaphone index
# ---------------------------------------------------------
def build_metaphone_index(words):
index = defaultdict(list)
for w in words:
m1, m2 = doublemetaphone(w)
if m1:
index[m1].append(w)
if m2 and m2 != m1:
index[m2].append(w)
return index
# ---------------------------------------------------------
# Combined DoubleMetaphone + Levenshtein matching
# ---------------------------------------------------------
def hybrid_match(v_word, index, top_n=1):
if not v_word:
return ["<no match>"]
m1, m2 = doublemetaphone(v_word)
candidates = []
# 1) Exact metaphone matches
if m1 in index:
candidates.extend(index[m1])
if m2 in index:
candidates.extend(index[m2])
# 2) If too few candidates, try similar metaphone keys
if len(candidates) < 5 and m1:
prefix = m1[:2]
for key in index:
if key.startswith(prefix):
candidates.extend(index[key])
# 3) Still nothing? → no match
if not candidates:
return ["<no match>"]
# 4) Compute Levenshtein distances
scored = []
for cand in candidates:
score = levenshtein(v_word, cand[:len(v_word)])
scored.append((score, cand))
scored.sort(key=lambda x: x[0])
best_score = scored[0][0]
# All equally good candidates
best = [w for s, w in scored if s == best_score]
return best[:top_n]
# ---------------------------------------------------------
# MAIN
# ---------------------------------------------------------
def main():
if len(sys.argv) < 3:
print("Usage: python3 phonetic_levenshtein.py voyn2latin.txt output.txt")
sys.exit(1)
infile = sys.argv[1]
outfile = sys.argv[2]
print("Loading Latin wordlist …")
latin_words = load_wordlist("wordlist.txt")
print(f"{len(latin_words)} Latin words loaded.")
print("Building DoubleMetaphone index …")
index = build_metaphone_index(latin_words)
print(f"Index contains {len(index)} metaphone keys.")
print("Decoding …")
with open(infile, "r", encoding="utf8") as f:
lines = f.readlines()
output_lines = []
for line in tqdm(lines, desc="Lines"):
if not line.strip():
output_lines.append("")
continue
words = line.split()
decoded_line = []
for w in words:
normalized = normalize_voynich_word(w)
matches = hybrid_match(normalized, index, top_n=1)
decoded_line.append(matches[0])
output_lines.append(" ".join(decoded_line))
with open(outfile, "w", encoding="utf8") as f:
for row in output_lines:
f.write(row + "\n")
print("Done →", outfile)
if __name__ == "__main__":
main()
ReneZ > 23-11-2025, 11:43 AM
Koen G > 23-11-2025, 12:06 PM
Koen G > 23-11-2025, 12:09 PM
Quote:It reads like a didactic fragment using the imagery of winter, salt, roots, and a “rough-scaled thing” to describe the process of enduring hardship until clarity returns.
In essence, the text seems to be about:
Holding steady through a difficult period by clinging to small certainties (“the smallest crumb,” “the root that keeps her safe”).
During this waiting, people rehearse what they know, retelling familiar sayings to maintain morale.
As long as they preserve their inner “salt” — their integrity or resilience — even the most stubborn, frightening problem (“the rough-scaled thing”) can eventually be handled.
So the metaphor points toward patience, memory, and inner discipline as the means to survive adversity until conditions improve.
Rafal > 23-11-2025, 12:21 PM
Philipp Harland > 23-11-2025, 12:27 PM
(23-11-2025, 12:06 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This is great work! All you are missing is the dreaded "interpretative step" and you've got a Solution going.
Maybe the next step can be to run a section through an LLM and ask it to force coherent meaning out of it.
To me, this is indistinguishable from a classic solution.
nablator > 23-11-2025, 01:10 PM
(23-11-2025, 11:30 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.5388 Latin words were extracted. That is definitely sufficient.
MarcoP > 24-11-2025, 07:14 AM
Koen G > 24-11-2025, 10:58 AM
(24-11-2025, 07:14 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.
A few years ago, I played with a more naive implementation of a similar idea. I think that using English (either modern or Shakespeare's) makes it easier to understand the gap between the word-salad output and meaningful text.
MarcoP > 24-11-2025, 11:08 AM
(24-11-2025, 10:58 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Although with Latin, we are much closer to mimicking what actual "solutions" feel like, with obscurity of the language being a prerequisite.