The Voynich Ninja

Hello,

Rather than trying to decode the VMS outright, I think we can certainly answer other questions about the MS with the help of AI. For example, we should be able to decide with high probability whether the MS has real meaning or not. And I propose a system for doing that.

My own take: what best fits the evidence is that someone commissioned the VMS to be exactly what it is: a mysterious, impressive book that nobody can read because it contains no meaning. This is highly plausible and fits with the physical evidence, linguistic evidence and cultural evidence of the time. I believe it's a pseudo-text, made to look like a language, without actual meaning, for the purpose of impressing people, and I think AI can help us decide one way or another.

The question to explore is what kind of system is this text most like?

The idea is to generate large sets of synthetic manuscripts under different assumptions and see which "universe" the Voynich statistically belongs to. For example:

Ciphered Latin/Hebrew/Italian texts (various substitution styles)
Real languages reformatted to look Voynich-like
Constructed languages with invented grammar
Structured pseudo-languages (rules for prefixes/suffixes, but no meaning)
Shorthand/abbreviation systems treated as full glyph sets

Then we can measure each synthetic universe against a Voynich "fingerprint panel" (word shapes, entropy, Zipf’s law, affix patterns, section differences, etc.). Rather than asking what does it say? this approach asks "what system is it most like?" If structured pseudo-language consistently fits better than ciphered Latin or conlang universes, that's powerful evidence.

This wouldn’t solve the translation, but it would be an important step in understanding the MS and it would be one box checked off.

Does this kind of “synthetic benchmarking” sound worth trying? Has anyone attempted something like this at scale?

Anyway, here's where AI did a lot of the work in building an outline for how the experiment might go with only off-the-shelf tools. The goal is to see which universe (ciphered language, real language, conlang, structured pseudo-language, shorthand/abbreviation, etc.) best reproduces the Voynich’s full statistical “fingerprint.”

No, I don't have expertise in this kind of research. I'm only seeing where AI can point to help us check off some boxes and let those with the expertise run with it.

1) Define the universes (generate many fakes)
Make 200–2,000 synthetic manuscripts, each matched in length and page/line structure to the VM. Each fake follows one hypothesis with tunable knobs:

A. Ciphered Natural Language

Source corpora: medieval Latin, Italian, Occitan, Hebrew (public domain).
Ciphers to implement:
1. Simple monoalphabetic substitution
2. Homophonic substitution (n symbols per plaintext char)
3. Syllabic substitution (map digraphs/triagraphs)
4. Nomenclator (frequent words → special symbols)
Extras: line-initial/line-final rules, abbreviation expansion (Latin breviographs), occasional nulls.

B. Real Language (no cipher) shaped to VM layout

Rewrap real Latin/Italian texts into Voynich-like lines/paragraphs and enforce a few layout quirks (e.g., frequent “q-” line-initial tokens) to probe the effect of mise-en-page alone.

C. Conlang (meaningful but invented)

Generators:
- Finite-state morphology (prefix–stem–suffix classes).
- PCFG (probabilistic context-free grammar) with phonotactics.
Dialects: Currier-A/B style parameter shifts (suffix set, token length).

D. Structured Pseudo-Language (no semantics)

Automata with rules like:
- prefix ∈ {qo, q, o, ch, …}, stem ∈ Σ{a,i,e,o,y}, suffix ∈ {dy, n, in, ain, aiin, …}
- position-dependent variants (line-initial gets more “q”)
- tunable affix productivity, stem entropy, and run-lengths
Also include a human-plausible generator: Markov/HMM with simple constraints to simulate “fast fluent scribbling.”

E. Shorthand/Abbreviation Universe

Start with Latin prose; compress using a learned set of ~30–60 breviographs/abbreviations (e.g., -us, -rum, -que), then hide the mapping (treat abbreviographs as glyphs). Vary aggressiveness.

2) Build the Voynich “fingerprint panel”
Compute the same metrics for the true VM and for every synthetic manuscript:

Token/Type structure

Zipf slope & curvature; Heaps’ law α, K
Word-length distribution; KS/AD distance vs VM
Vocabulary growth by page/quire

Local dependencies

Char bigram/trigram distributions; JS divergence
Conditional entropy H(Xₙ|Xₙ₋₁) and H(wordₙ|wordₙ₋₁)
Mutual information vs distance (Hilberg-style curve)

Morphology & segmentation

Morfessor/BPE: number of subunits, affix productivity, stem/affix ratio
Family proliferation: counts of {dain, daiin, daiiin}-type ladders

Positional/structural signals

Line-initial vs line-final token profiles
Paragraph-initial bias
Page/section (Herbal/Astro/Balne/Recipes) drift metrics (KL divergence)

Compressibility / model fit

LZMA/PPM/ZPAQ ratios
n-gram perplexity (n=3..7) trained on one half, tested on the other
Tiny Transformer perplexity (character-level) trained on each universe, tested on VM (cross-perplexity)

Clustering/embedding

UMAP/t-SNE on character n-gram vectors; silhouette vs VM cluster
Rank-order correlation (Kendall τ) of frequency lists

3) Scoring: which universe fits best?
Use multiple, complementary criteria:

Distance aggregation: Normalize each metric to z-scores, then compute a weighted composite distance of each synthetic to the VM. Rank universes by median distance.
Model selection: Approximate Bayesian Computation (ABC): treat generator knobs as priors, accept parameter settings whose synthetic stats fall within ε of the VM. Compare posterior mass across universes.
Held-out validation: Fit knobs on half the VM; test distances on the other half (and per section).

4) Robustness checks

Ablations: remove line-position rules or suffix ladders—does fit collapse?
Overfitting guard: ensure no generator is trained directly on VM tokens (only statistics), and verify generalization across sections.
Adversarial baseline: try to force ciphered Latin to match VM—if it still lags pseudo-language on multiple metrics, that’s strong evidence.

5) Tooling (all off-the-shelf)

Python:
numpy, pandas, scikit-learn, matplotlib, networkx
NLP/stat:
morfessor, sentencepiece (BPE), nltk for n-grams
Compressors: builtin, lzma, bz2, zlib; optional PPMd via a Python wrapper
Dimensionality reduction:
umap-learn, scikit-learn, (t-SNE/UMAP)
Lightweight Transformers (optional):
transformers with a tiny char-LM

6) Workflow & timeline (lean team)
Week 1–2: Data wrangling (VM EVA, Latin/Italian corpora), page/line schema, metric code scaffolding
Week 3–6: Implement generators A–E; unit tests; produce first 500 synthetics
Week 7–8: Compute full fingerprint panel; initial ranking
Week 9–10: ABC fitting per universe; robustness/ablations
Week 11–12: Write-up, plots, release code & datasets (repro pack)

7) Readouts you can trust (what “success” looks like)

A league table: per-universe composite distance to VM (with error bars)
Posterior plots: which parameter regions (e.g., high suffix productivity, low stem entropy) best match VM
Confusion matrix from a classifier trained to tell universes apart using the fingerprint; if VM gets classified as “structured pseudo-language” with high confidence, that’s decisive.

8) “Citizen-science” version (solo, laptop-friendly)

Implement Universe D (pseudo-language) and Universe A(1) (mono-substitution over Latin).
Compute a mini fingerprint: Zipf slope, word-length KS, bigram JS, compression, Morfessor affix productivity.
Generate 100 synthetics for each universe; plot distance distributions vs VM.
If pseudo beats ciphered Latin on 4/5 metrics, you’ve got a publishable note.

9) Pitfalls & how to avoid them

Layout leakage: VM line/page structure matters—always replicate it in synthetics.
Cherry-picking metrics: pre-register the metric set; report all.
Over-tuning: do ABC on one half; evaluate on the other.
Section bias: score by section and overall; the winner should be consistent.

(09-09-2025, 03:34 PM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Does this kind of “synthetic benchmarking” sound worth trying?

No, at least not to me. Many reasons, but the primary ones are:

1) Modern AIs appear to be all-knowing universal systems that can reliably produce high quality content across many domains, but they can't and they won't.

2) Moreover, even if this experiment was possible and we got a magic machine that would produce, say, 100 enciphered Latin herbals, and 100 enciphered Arabic herbals, and 100 pseudo language herbals, and we got lucky and found out that according to statistical metric A the Voynich MS is 80% Latin herbal, 10% Arabic herbal and 10% pseudo language, what are we going to do with this information? I see absolutely no use for it. You certainly won't be able to exclude that the Voynich MS is X or Y based on the similarity argument, the maximum you can say it's not very similar to X or Y. It still could be X or Y, just an unusual specimen of X or Y. So, no actual useful information.

(09-09-2025, 05:03 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(09-09-2025, 03:34 PM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Does this kind of “synthetic benchmarking” sound worth trying?

No, at least not to me. Many reasons, but the primary ones are:

1) Modern AIs appear to be all-knowing universal systems that can reliably produce high quality content across many domains, but they can't and they won't.

2) Moreover, even if this experiment was possible and we got a magic machine that would produce, say, 100 enciphered Latin herbals, and 100 enciphered Arabic herbals, and 100 pseudo language herbals, and we got lucky and found out that according to statistical metric A the Voynich MS is 80% Latin herbal, 10% Arabic herbal and 10% pseudo language, what are we going to do with this information? I see absolutely no use for it. You certainly won't be able to exclude that the Voynich MS is X or Y based on the similarity argument, the maximum you can say it's not very similar to X or Y. It still could be X or Y, just an unusual specimen of X or Y. So, no actual useful information.

Or, now or in the future, we do generate thousands of fake texts reliably with the help of AI, the tests do work, and they do consistently and repeatably pin the VMS with high confidence to a particular universe of texts. I wouldn't call AI a magic machine. ChatGPT 5 is already a PhD expert in everything related to language (and math/stats/physics). I think there will be a way to leverage that to perform this experiment, if not now then probably within a few versions.

(10-09-2025, 12:54 AM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.Or, now or in the future, we do generate thousands of fake texts reliably with the help of AI, the tests do work, and they do consistently and repeatably pin the VMS with high confidence to a particular universe of texts.

I'm not sure about others, but I will just dismiss this result as irrelevant without much thinking. Showing that the Voynich MS is similar to a particular set of texts doesn't prove that it belongs to that set. A snake is more similar to a garden hose than to a dog, but I doubt that it would be reasonable to treat a snake as garden inventory.

(10-09-2025, 12:54 AM)Phoenixjn Wrote: You are not allowed to view links. Register or Login to view.ChatGPT 5 is already a PhD expert in everything related to language (and math/stats/physics).

No it isn't. For one, a PhD will refuse to answer a senseless question.

Phoenixjn

oshfdk

Phoenixjn

oshfdk

dexdex