24-09-2025, 07:15 PM
Hi all,
I’m still learning how best to present this work here. I know the forum has seen plenty of “AI slop,” so I want to make clear up front: this is not an AI translation. What I’m sharing below is a small demo showing why naïve code breaks completely on Voynich EVA text, and how a very simple rule-based parser (prefix/suffix/infix checks) produces consistent partial results across EVA lines.
It’s not perfect — many tokens still come out as “[?]” — but that’s part of the point: it’s mechanical and testable, not free-form invention. My goal is to invite feedback on whether this kind of structured, token-level approach looks like a credible path forward, and if so, how to make it stronger.
1) Naïve approach (fails)
# Naïve dictionary: expects exact token matches → fails on real EVA strings
rules = {
"chedy": "herb",
"qokchdy": "root extract",
"ody": "base matter",
"she": "fire/calcination",
"dol": "water cycle",
"oram": "joint/limb",
}
eva_line = "ychedy shetshdy qotar okedy qokal saiin ol karar odeeed"
decoded = [rules.get(tok, "[?]") for tok in eva_line.split()]
print(" ".join(decoded))
Expected output
[?] [?] [?] [?] [?] [?] [?] [?] [?]
Why it breaks: EVA tokens are variable (prefixes, suffixes, infixes). Exact-match lookup doesn’t work.
2) Rule-based parsing (prefix/suffix/infix)
# Minimal, reproducible rule-based decoder using prefix/suffix/infix tests
def decode_token(t):
# suffix rules
if t.endswith("ody"): return "base matter"
if t.endswith("ram"): return "joint/limb"
if t.endswith("dy") and t.startswith("qokc"):
return "root extract" # qokchdy / qokchedy variants
# prefix rules
if t.startswith("che"): return "herb/plant"
if t.startswith("she"): return "fire/calcination"
if t.startswith("oked"): return "preparation/infusion"
if t.startswith("qok"): return "boil/infuse (qok- class)"
if t.startswith("kar"): return "vessel/container"
# infix rule
if "dol" in t or "qodal" in t:
return "water/cycle/liquid"
# bridging/repetition token often seen
if t == "saiin": return "again/repeat"
return "[?]"
eva_line = "ychedy shetshdy qotar okedy qokal saiin ol karar odeeed"
decoded = [decode_token(tok) for tok in eva_line.split()]
print(" ".join(decoded))
Expected output (example)
herb/plant fire/calcination [?] preparation/infusion boil/infuse (qok- class) again/repeat [?] vessel/container base matter
Point: Same text that the naïve code couldn’t read now yields mechanical, rule-driven partial readings—no “AI translation,” just explicit token logic.
3) Cross-folio consistency check (multiple EVA lines)
# Two additional EVA lines (from f85r1 examples used above)
eva_lines = [
"kchedar yteol okchdy qokedy otor odor or chedy otechdy dal cphedy",
"oees aiin olkeeody ors cheey qokchdy qotol okar otar otchy dkam",
]
for i, line in enumerate(eva_lines, 1):
decoded = [decode_token(tok) for tok in line.split()]
print(f"Line {i}:", line)
print("Decoded :", " | ".join(decoded), "\n")
Expected Output (example)
Line 1: kchedar yteol okchdy qokedy otor odor or chedy otechdy dal cphedy
Decoded : herb/plant | [?] | root extract | boil/infuse (qok- class) | [?] | [?] | [?] | herb/plant | preparation/infusion | [?] | herb/plant
Line 2: oees aiin olkeeody ors cheey qokchdy qotol okar otar otchy dkam
Decoded : [?] | [?] | base matter | [?] | herb/plant | root extract | [?] | vessel/container | [?] | [?] | [?]
Points this demonstrates:
• Consistency: tokens like chedy → herb/plant, qokchdy → root extract, …ody → base matter are read the same way across lines.
• Reproducibility: anyone can run this and see the same partial outputs.
• Non-hallucinatory: when no rule matches, the code says “[?]”, instead of inventing prose.
I know this is only a partial framework — there are still many unsolved tokens. That’s intentional, since I don’t want to overfit or make guesses where the rules don’t yet apply. If you see flaws in the rules, or if you think better tests would expose the weaknesses (or strengths) of this approach, I’d really like to hear it. I’m aiming for something reproducible and mechanical, not “mystical translation"
Best Regards,
Francis
I’m still learning how best to present this work here. I know the forum has seen plenty of “AI slop,” so I want to make clear up front: this is not an AI translation. What I’m sharing below is a small demo showing why naïve code breaks completely on Voynich EVA text, and how a very simple rule-based parser (prefix/suffix/infix checks) produces consistent partial results across EVA lines.
It’s not perfect — many tokens still come out as “[?]” — but that’s part of the point: it’s mechanical and testable, not free-form invention. My goal is to invite feedback on whether this kind of structured, token-level approach looks like a credible path forward, and if so, how to make it stronger.
1) Naïve approach (fails)
# Naïve dictionary: expects exact token matches → fails on real EVA strings
rules = {
"chedy": "herb",
"qokchdy": "root extract",
"ody": "base matter",
"she": "fire/calcination",
"dol": "water cycle",
"oram": "joint/limb",
}
eva_line = "ychedy shetshdy qotar okedy qokal saiin ol karar odeeed"
decoded = [rules.get(tok, "[?]") for tok in eva_line.split()]
print(" ".join(decoded))
Expected output
[?] [?] [?] [?] [?] [?] [?] [?] [?]
Why it breaks: EVA tokens are variable (prefixes, suffixes, infixes). Exact-match lookup doesn’t work.
2) Rule-based parsing (prefix/suffix/infix)
# Minimal, reproducible rule-based decoder using prefix/suffix/infix tests
def decode_token(t):
# suffix rules
if t.endswith("ody"): return "base matter"
if t.endswith("ram"): return "joint/limb"
if t.endswith("dy") and t.startswith("qokc"):
return "root extract" # qokchdy / qokchedy variants
# prefix rules
if t.startswith("che"): return "herb/plant"
if t.startswith("she"): return "fire/calcination"
if t.startswith("oked"): return "preparation/infusion"
if t.startswith("qok"): return "boil/infuse (qok- class)"
if t.startswith("kar"): return "vessel/container"
# infix rule
if "dol" in t or "qodal" in t:
return "water/cycle/liquid"
# bridging/repetition token often seen
if t == "saiin": return "again/repeat"
return "[?]"
eva_line = "ychedy shetshdy qotar okedy qokal saiin ol karar odeeed"
decoded = [decode_token(tok) for tok in eva_line.split()]
print(" ".join(decoded))
Expected output (example)
herb/plant fire/calcination [?] preparation/infusion boil/infuse (qok- class) again/repeat [?] vessel/container base matter
Point: Same text that the naïve code couldn’t read now yields mechanical, rule-driven partial readings—no “AI translation,” just explicit token logic.
3) Cross-folio consistency check (multiple EVA lines)
# Two additional EVA lines (from f85r1 examples used above)
eva_lines = [
"kchedar yteol okchdy qokedy otor odor or chedy otechdy dal cphedy",
"oees aiin olkeeody ors cheey qokchdy qotol okar otar otchy dkam",
]
for i, line in enumerate(eva_lines, 1):
decoded = [decode_token(tok) for tok in line.split()]
print(f"Line {i}:", line)
print("Decoded :", " | ".join(decoded), "\n")
Expected Output (example)
Line 1: kchedar yteol okchdy qokedy otor odor or chedy otechdy dal cphedy
Decoded : herb/plant | [?] | root extract | boil/infuse (qok- class) | [?] | [?] | [?] | herb/plant | preparation/infusion | [?] | herb/plant
Line 2: oees aiin olkeeody ors cheey qokchdy qotol okar otar otchy dkam
Decoded : [?] | [?] | base matter | [?] | herb/plant | root extract | [?] | vessel/container | [?] | [?] | [?]
Points this demonstrates:
• Consistency: tokens like chedy → herb/plant, qokchdy → root extract, …ody → base matter are read the same way across lines.
• Reproducibility: anyone can run this and see the same partial outputs.
• Non-hallucinatory: when no rule matches, the code says “[?]”, instead of inventing prose.
I know this is only a partial framework — there are still many unsolved tokens. That’s intentional, since I don’t want to overfit or make guesses where the rules don’t yet apply. If you see flaws in the rules, or if you think better tests would expose the weaknesses (or strengths) of this approach, I’d really like to hear it. I’m aiming for something reproducible and mechanical, not “mystical translation"
Best Regards,
Francis