The Voynich Ninja

Full Version: VMS to pseudo - latin ( not a solution !)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
First of all: This is NOT a serious solution for the VMS! Since I have seen many “solutions” with pseudo-Latin lately, I just want to demonstrate how easy it is to replicate this with any word list. Anyone can reproduce the procedure described below. You may only need to install Metaphone for Python:

pip install Metaphone

First, you can map the most common letters in VMS to the most common letters in the Latin alphabet. This step isn't really necessary, but it's “nicer”.

cat RF1a-n-x7.txt | tr oehyacdiklrstnqpmfgxzv ieutasrnmocdlpbqghfxyz > voyn2latin.txt

Now you create a word list from any Latin text. I took an excerpt from Regimen Sanitatis. 5388 Latin words were extracted. That is definitely sufficient.

You are not allowed to view links. Register or Login to view.

Finally, run the following Python script. This will take a few minutes. The script checks the Levenshtein distance and the phonetic similarity of the words in voyn2latin.txt against the word list. The words found are inserted accordingly and written to the output.

python phonetic_levenshtein.py voyn2latin.txt mapped.txt


Code:
#!/usr/bin/env python3
# coding: utf-8
"""
DoubleMetaphone + Levenshtein Hybrid Decoder
--------------------------------------------------------
1. Load Latin words from wordlist.txt
2. Build a DoubleMetaphone index
3. For each Voynich word:
  - Compute its DoubleMetaphone codes
  - Find matching or phonetically similar Latin words
  - Rank candidates using Levenshtein distance
  - Output the best candidate (or <no match>)
"""

import sys
from metaphone import doublemetaphone
from collections import defaultdict
from tqdm import tqdm

# ---------------------------------------------------------
# Levenshtein distance
# ---------------------------------------------------------
def levenshtein(a, b):
    if not a:
        return len(b)
    if not b:
        return len(a)

    dp = range(len(b) + 1)
    for i, ca in enumerate(a, 1):
        new_dp = [i]
        for j, cb in enumerate(b, 1):
            if ca == cb:
                new_dp.append(dp[j-1])
            else:
                new_dp.append(1 + min(dp[j-1], dp[j], new_dp[-1]))
        dp = new_dp
    return dp[-1]


# ---------------------------------------------------------
# Load Latin words
# ---------------------------------------------------------
def load_wordlist(path="wordlist.txt"):
    words = []
    with open(path, "r", encoding="utf8") as f:
        for line in f:
            word = line.strip().lower()
            if word:
                words.append(word)
    return words


# ---------------------------------------------------------
# Normalize a Voynich word (remove non-letters)
# ---------------------------------------------------------
def normalize_voynich_word(w):
    return "".join(c.lower() for c in w if c.isalpha())


# ---------------------------------------------------------
# Build Metaphone index
# ---------------------------------------------------------
def build_metaphone_index(words):
    index = defaultdict(list)
    for w in words:
        m1, m2 = doublemetaphone(w)
        if m1:
            index[m1].append(w)
        if m2 and m2 != m1:
            index[m2].append(w)
    return index


# ---------------------------------------------------------
# Combined DoubleMetaphone + Levenshtein matching
# ---------------------------------------------------------
def hybrid_match(v_word, index, top_n=1):
    if not v_word:
        return ["<no match>"]

    m1, m2 = doublemetaphone(v_word)
    candidates = []

    # 1) Exact metaphone matches
    if m1 in index:
        candidates.extend(index[m1])
    if m2 in index:
        candidates.extend(index[m2])

    # 2) If too few candidates, try similar metaphone keys
    if len(candidates) < 5 and m1:
        prefix = m1[:2]
        for key in index:
            if key.startswith(prefix):
                candidates.extend(index[key])

    # 3) Still nothing? → no match
    if not candidates:
        return ["<no match>"]

    # 4) Compute Levenshtein distances
    scored = []
    for cand in candidates:
        score = levenshtein(v_word, cand[:len(v_word)])
        scored.append((score, cand))

    scored.sort(key=lambda x: x[0])
    best_score = scored[0][0]

    # All equally good candidates
    best = [w for s, w in scored if s == best_score]

    return best[:top_n]


# ---------------------------------------------------------
# MAIN
# ---------------------------------------------------------
def main():
    if len(sys.argv) < 3:
        print("Usage: python3 phonetic_levenshtein.py voyn2latin.txt output.txt")
        sys.exit(1)

    infile = sys.argv[1]
    outfile = sys.argv[2]

    print("Loading Latin wordlist …")
    latin_words = load_wordlist("wordlist.txt")
    print(f"{len(latin_words)} Latin words loaded.")

    print("Building DoubleMetaphone index …")
    index = build_metaphone_index(latin_words)
    print(f"Index contains {len(index)} metaphone keys.")

    print("Decoding …")

    with open(infile, "r", encoding="utf8") as f:
        lines = f.readlines()

    output_lines = []

    for line in tqdm(lines, desc="Lines"):
        if not line.strip():
            output_lines.append("")
            continue

        words = line.split()
        decoded_line = []

        for w in words:
            normalized = normalize_voynich_word(w)
            matches = hybrid_match(normalized, index, top_n=1)
            decoded_line.append(matches[0])

        output_lines.append(" ".join(decoded_line))

    with open(outfile, "w", encoding="utf8") as f:
        for row in output_lines:
            f.write(row + "\n")

    print("Done →", outfile)


if __name__ == "__main__":
    main()

You are not allowed to view links. Register or Login to view.

If you want to do this to yourself, you can run mapped.txt through a translator. The result is just as meaningless as most of the Latin “solutions” I've seen. If you actually find a meaningful sentence, you can keep it Wink
Spot on!

This would be nr. 2 on the list of unserious solutions that, while perhaps not better than serious proposals, is at least as good as them.
This is great work! All you are missing is the dreaded "interpretative step" and you've got a Solution going.
Maybe the next step can be to run a section through an LLM and ask it to force coherent meaning out of it.

[attachment=12575]

To me, this is indistinguishable from a classic solution.
You can then ask ChatGPT to interpret the metaphor it just wrote:

Quote:It reads like a didactic fragment using the imagery of winter, salt, roots, and a “rough-scaled thing” to describe the process of enduring hardship until clarity returns.

In essence, the text seems to be about:

Holding steady through a difficult period by clinging to small certainties (“the smallest crumb,” “the root that keeps her safe”).
During this waiting, people rehearse what they know, retelling familiar sayings to maintain morale.
As long as they preserve their inner “salt” — their integrity or resilience — even the most stubborn, frightening problem (“the rough-scaled thing”) can eventually be handled.

So the metaphor points toward patience, memory, and inner discipline as the means to survive adversity until conditions improve.
Nice job!

At moments it almost makes sense and seems related to plants and medicine:

diminute solet solet sequitur sic aqueis alicui solet immundo
licitis alius pomerium sed solet tolerat sic scilicet scire


diminished usually usually follows so watery to someone usually unclean
permissible another orchard but usually tolerates so obviously know

On the second thought it doesn't  Smile
(23-11-2025, 12:06 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This is great work! All you are missing is the dreaded "interpretative step" and you've got a Solution going.
Maybe the next step can be to run a section through an LLM and ask it to force coherent meaning out of it.



To me, this is indistinguishable from a classic solution.

Dreaded would be the correct word.

Reminder: Artificial Stupidity may be good for doing your homework, but not for analyzing something like the VMS. I don't know how people don't get this by now. I'm trying to be measured and polite but fhoooooooohhh..... There are so many people who don't know that LLMs are built on transformers which are modelled on rudimentary ANNs which are just functions that take in numbers and spit them back out except transmogrified.
(23-11-2025, 11:30 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.5388 Latin words were extracted. That is definitely sufficient.

You need ~400000 forms to cover most of the classic Latin vocabulary found in real texts, not including medieval spellings and specialized subjects like medicine, alchemy, herbals...
You are not allowed to view links. Register or Login to view.

A few years ago, I played with a more naive implementation of a similar idea. I think that using English (either modern or Shakespeare's) makes it easier to understand the gap between the word-salad output and meaningful text.
(24-11-2025, 07:14 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.

A few years ago, I played with a more naive implementation of a similar idea. I think that using English (either modern or Shakespeare's) makes it easier to understand the gap between the word-salad output and meaningful text.

That's a good point. The vast majority of the population can't see how meaningless the Latin is. It would be much more poignant in English.

Although with Latin, we are much closer to mimicking what actual "solutions" feel like, with obscurity of the language being a prerequisite.
(24-11-2025, 10:58 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Although with Latin, we are much closer to mimicking what actual "solutions" feel like, with obscurity of the language being a prerequisite.

Yes, that's probably why English is not popular among "solvers"
Pages: 1 2 3 4 5