The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Evidence for Local Copy–Mutation in the Scribe 1 Corpus - note: not peer reviewed.

You are not allowed to view links. Register or Login to view.

Github Repository
You are not allowed to view links. Register or Login to view.

Related paper

Beyond Currier A and B: ED-Defined Folio Regimes and Lexical Continuity in the Voynich Manuscript - note: not peer reviewed.
You are not allowed to view links. Register or Login to view.

That paper is regarding my earlier posts: The oddities of the bigram ED

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

So, there you have my work to date. My helmet is on, prepared to duck.

Ok, so the lack of comments so far tells me that my first post was either way too heavy on research-paper mode and too light on explanation, or I have stunned everyone into silence with my brilliance. I strongly suspect it’s the former.

So here’s the simpler version.

I think I’ve found strong evidence for a copy/mutate system inside the Voynich. This overlaps with a lot of Torsten Timm’s work, but I think it goes further. Most copy/mutate theories look only at nearby previous pages. There is definitely evidence for that. But in a real production environment, that would mean the scribe constantly flipping pages around to use as references. What I’m seeing instead is evidence that the copying and mutation operated at the SHEET level. In other words, the scribe could have had one or two sheets propped up nearby and repeatedly pulled words from them while writing a new page. The current page itself then becomes an additional local source. So the workflow becomes:

copy a word from a source sheet
slightly modify it
copy it again
mutate it again
reuse words already written on the current page
repeat hundreds of times

The interesting part is what happens when you analyze the results.
Here’s You are not allowed to view links. Register or Login to view. from the Zandbergen/Landini transcription.

FOLIO: You are not allowed to view links. Register or Login to view. QUIRE(S): 3 (82)
SHEET(S): 4 (82)
SCRIBE(S): 1 (82)
HAND(S): A1 (82)
CURRIER LANGUAGE(S): A (82)
TOTAL WORD INSTANCES LEN>=3: 75
UNIQUE TOKENS LEN>=3: 60
CORE TOKENS TESTED: 24
SAME-FOLIO ED1 DERIVED TOKENS: 51
PREEXISTING SOURCE CANDIDATES SEARCHED: 2153
ALL PRIOR-FOLIO MATCHES FOUND: 251

SOURCE-SHEET COVER
CORE TOKENS WITH AT LEAST ONE PRIOR-FOLIO MATCH: 24
EXACT SMALLEST SHEET COUNT FOUND: 4
COVER METHOD: exact full cover within max_sheets=6
SELECTED SHEETS:
quire 1, sheet 1: covers 17 core tokens; adds 17
quire 2, sheet 2: covers 17 core tokens; adds 4
quire 1, sheet 4: covers 15 core tokens; adds 2
quire 1, sheet 3: covers 14 core tokens; adds 1
COVERED CORE TOKENS / COVERABLE CORE TOKENS: 24 / 24
COVERED CORE TOKENS / TOTAL CORE TOKENS: 24 / 24
UNCOVERED COVERABLE CORE TOKENS: 0

RETAINED SOURCE SHEETS AFTER CORE PRUNING
q1s1
q2s2

SHEET CLASSIFICATION
q1s1: +17 [CORE]
q2s2: +4 [SECONDARY]
q1s4: +2 [RESIDUE]
q1s3: +1 [RESIDUE]

RESIDUE TOKENS (NEWLY ADDED BY RESIDUE SHEETS)
choldy | stripped choldy | f20v:1:8 -> cpholdy | stripped choldy | f4r:8:5 | q1s4 | ED0
shain | stripped shain | f20v:8:1 -> shain | stripped shain | f4r:3:7 | q1s4 | ED0
choraly | stripped choraly | f20v:8:2 -> chodaly | stripped chodaly | f3v:6:4 | q1s3 | ED1

RESIDUE CORE-RECHECK
RESOLVED ED1: choldy | stripped choldy | f20v:1:8 -> sholdy | stripped sholdy | f1r:1:9 | q1s1
RESOLVED ED1: shain | stripped shain | f20v:8:1 -> shaiin | stripped shaiin | f1r:22:2 | q1s1
RESOLVED ED2: choraly | stripped choraly | f20v:8:2 -> cthoary | stripped choary | f1r:3:6 | q1s1

RESIDUE SUMMARY
total: 3
ED0: 0
ED1: 2
ED2: 1
unresolved: 0

UNRESOLVED RESIDUE DIAGNOSTICS
none

The page contains:

75 total words
60 unique words
51 same-page ED1 derivations

The analyzer first removes the obvious same-page ED1 mutations to isolate the “core” vocabulary of the page.
It then searches the earlier manuscript for possible source matches and tries reducing those matches down into the smallest possible set of source sheets.
Result:

17 core tokens trace back to quire 1, sheet 1
a few others come from small secondary sheets
and even the leftover “residue” words eventually collapse back to q1s1 through ED1 or ED2

So despite the page looking diverse on the surface, most of the vocabulary ecology reduces back into a very small packet of source material centered around q1s1/f1r.

The manuscript may not have been generated from a giant hidden plaintext or complex cipher system at all. It may instead have been built recursively from a small rolling ecology of existing words, copied and slightly mutated over time while constrained by a simple glyph-adjacency ledger.

And that brings up part 2: the ledger.

It is basically a Voynich word validator.
Very simplified, it has 4 columns:

the Voynich glyph
allowed prefix followers
allowed midfix followers
allowed suffix followers

The ledger is built by looking at all the words on Scribe 1 pages and recording where glyphs are allowed to occur and what tends to follow them.

[attachment=15621]

So if you wanted to create or validate a word:
Start with F.
The ledger says A is a valid prefix follower.
Then A allows C as a midfix.
C allows H.
H allows Y.
Y allows S as a suffix.
F → a → c → h → y → s
You just created a legal Voynich-style word.

For a mutation, you follow the ledger and
F → a → c → h → y → a → s
and
F → a → c → h → a → s
is a valid word.

The real ledger is more complicated because each follower also has weighting attached to it. Some transitions are common, some are rare. That is how my generator validates mutations when it creates them. Copy a word from a source sheet, mutate one glyph, check whether the result is still legal according to the ledger, and if it is, the new word survives. There is obviously more going on than this, but that is the basic mechanism. From what I’m seeing so far, most Scribe 1 pages appear reducible to copy/mutate behavior from a single dominant source sheet, occasionally two, and only rarely three.

So that’s the theory in a nutshell:

Copy/mutate using sheets as the source while constrained by a glyph-adjacency ledger.

I can create this post by taking the right words and word parts from your post, but this doesn't mean this is what I did. Does your research show evidence that the Voynich MS can be created using copy and mutate or does it show evidence that the manuscript must have been created this way?

Maybe you can show on a simple example. There is this word ddssShx on the Rosettes folio. Which other words is it based on and which words are copy and mutate derivations of it?

(17-05-2026, 05:10 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I can create this post by taking the right words and word parts from your post, but this doesn't mean this is what I did. Does your research show evidence that the Voynich MS can be created using copy and mutate or does it show evidence that the manuscript must have been created this way?

Not that it MUST have been created this way. No. I’m making no claims that this definitively explains the Voynich. What I’m saying is that the manuscript can be reverse engineered into this kind of process, and the results are strong enough that I think it deserves serious consideration as a production method.

I’m not just visually imitating Voynich words. I built an analyzer first, then used the behavior it discovered to build a generator. That generator now reproduces a fair number of Scribe 1 statistical and structural behaviors. Not all of them by any means, but enough that the model is at least plausible. And honestly, if this IS close to the real method, I suspect reproducing the exact Voynich would be impossible anyway. Once you start trying to model human production behavior in Python, you quickly discover how messy humans actually are.

Also, this currently works best for Scribe 1. Scribe 2 is much harder.
The same general approach still partially works on Scribe 2, but Scribe 2 does not collapse neatly into the same compact 1–2 sheet packet structure that Scribe 1 does. That tells me one of two things:

either Scribe 2 was produced differently,
or the same basic process evolved/drifted into a more complex regime.

I suspect the latter, but I’m not prepared to make a strong claim on Scribe 2 yet because the reduction behavior is nowhere near as clean as Scribe 1.

But, i will say this. I believe the Voynich has LIKELY managed to remain a mystery because a LOT of research has focused on specific sections and not specific scribes. If you look at Currier and Davis and my work on the bigram ED, they all say that the scribes had very different vocabularies and likely methods. If you mash all that together (Herbal in particular) you're shoving 2-5 methods of production together and trying to extract 1 set of results.

(17-05-2026, 04:27 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And that brings up part 2: the ledger.

It is basically a Voynich word validator.
Very simplified, it has 4 columns:
the Voynich glyph

allowed prefix followers

allowed midfix followers

allowed suffix followers

The ledger is built by looking at all the words on Scribe 1 pages and recording where glyphs are allowed to occur and what tends to follow them.

I don't follow you. What do you mean by 'prefix follower', 'midfix follower', 'suffix follower'? Ie. using the 'f-ledger table' shown in your previous post: if I have 'fa', what can follow the 'a'? 'r,t' (prefix followers) or 'c,i,l,r,n' (midfix fololowers) or 's,t' (suffix followers)? And why?

(17-05-2026, 04:27 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.So if you wanted to create or validate a word:
Start with F.
The ledger says A is a valid prefix follower.
Then A allows C as a midfix.
C allows H.
H allows Y.
Y allows S as a suffix.
F → a → c → h → y → s
You just created a legal Voynich-style word.

Always using your ledger, and the same columns I think you used, what forbids me from creating/validating 'fandao', which does not look much to be a Voynich-style word?

(17-05-2026, 04:27 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.For a mutation, you follow the ledger and
F → a → c → h → y → a → s
and
F → a → c → h → a → s
is a valid word.

Yet again I don't understand. If you find 'fachyas', why does it matter that 'fachas', without the 'y' is a valid word? And, are really 'fachyas'/'fachas' two valid Voynichese words? There's only a word in the full text which starts with 'fach': fachys.

(17-05-2026, 05:17 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Maybe you can show on a simple example. There is this word ddssShx on the Rosettes folio. Which other words is it based on and which words are copy and mutate derivations of it?

That specific word, no. Reason being is in the transcriptions I used it's identified as a splat. I specifically strip any splat word out of my working corpus because there are at least as many ways to define splats as there are splats. So. That word. No. And, that is Scribe 2. I have focused on Scribe 1 and have yet to figure out how Scribe 2 switched regimes and created what they did. However, if you go to my repo, the analyzer will tell you where it THINKS those sources are. Run it with core assignments and cluster details on.

f86v6

Code:
FOLIO: f86v6

QUIRE(S): 14 (482)

SHEET(S): 1 (482)

SCRIBE(S): 3 (482)

HAND(S): B3 (482)

CURRIER LANGUAGE(S): unknown

TOTAL WORD INSTANCES LEN>=3: 422

UNIQUE TOKENS LEN>=3: 266

CORE TOKENS TESTED: 84

SAME-FOLIO ED1 DERIVED TOKENS: 338

PREEXISTING SOURCE CANDIDATES SEARCHED: 13916

ALL PRIOR-FOLIO MATCHES FOUND: 1404

SOURCE CLUSTER SUMMARY

  matched source sheets: 40

    quire 14, sheet 1: 95 matches

    quire 9, sheet 1: 78 matches

    quire 13, sheet 5: 78 matches

    quire 11, sheet 1: 77 matches

    quire 13, sheet 2: 68 matches

    quire 13, sheet 1: 63 matches

    quire 13, sheet 3: 60 matches

    quire 10, sheet 1: 59 matches

    quire 13, sheet 4: 53 matches

    quire 6, sheet 3: 48 matches

  matched source folios: 168

    f76r (quire 13, sheet 2): 28 matches

    f58r (quire 8, sheet 2): 25 matches

    f85r1 (quire 14, sheet 1): 24 matches

    f66r (quire 8, sheet 1): 21 matches

    f79r (quire 13, sheet 5): 21 matches

    f80v (quire 13, sheet 5): 20 matches

    f84r (quire 13, sheet 1): 20 matches

    f86v5 (quire 14, sheet 1): 20 matches

    f58v (quire 8, sheet 2): 19 matches

    f79v (quire 13, sheet 5): 19 matches

    f82v (quire 13, sheet 3): 19 matches

    f70r2 (quire 10, sheet 1): 18 matches

    f80r (quire 13, sheet 5): 18 matches

    f85r2 (quire 14, sheet 1): 18 matches

    f76v (quire 13, sheet 2): 17 matches

SOURCE-SHEET COVER

CORE TOKENS WITH AT LEAST ONE PRIOR-FOLIO MATCH: 82

BEST-EFFORT SHEET SET FOUND: 3

COVER METHOD: no exact full cover within max_sheets=6; capped greedy result shown

SELECTED SHEETS:

  quire 14, sheet 1: covers 40 core tokens; adds 40

  quire 8, sheet 2: covers 33 core tokens; adds 13

  quire 13, sheet 2: covers 36 core tokens; adds 8

COVERED CORE TOKENS / COVERABLE CORE TOKENS: 61 / 82

COVERED CORE TOKENS / TOTAL CORE TOKENS: 61 / 84

UNCOVERED COVERABLE CORE TOKENS: 21

  alshdr, chcphar, chokain, cholkar, dairal, dairody, dytshy, lshechy, okeeeykeey, okeockhey, olkshed, opoly, orchcthy, orom, otolkshy, otyteeodaiin, qotardam, qotchdaiin, sarar, shckhor, shoifhy

RETAINED SOURCE SHEETS AFTER CORE PRUNING

  q14s1

  q8s2

  q13s2

SHEET CLASSIFICATION

  q14s1: +40  [CORE]

  q8s2: +13  [CORE]

  q13s2: +8  [SECONDARY]

RESIDUE TOKENS (NEWLY ADDED BY RESIDUE SHEETS)

  none

RESIDUE CORE-RECHECK

  none

RESIDUE SUMMARY

  total: 0

  ED0: 0

  ED1: 0

  ED2: 0

  unresolved: 0

UNRESOLVED RESIDUE DIAGNOSTICS

  none

PARENT DISTANCE HISTOGRAM (ASSIGNED PARENTS ONLY)

  dist  3:    5 (  8.9%)

  dist  6:    1 (  1.8%)

  dist  7:    3 (  5.4%)

  dist  20:    3 (  5.4%)

  dist  21:    11 ( 19.6%)

  dist  42:    1 (  1.8%)

  dist  56:    7 ( 12.5%)

  dist  57:    25 ( 44.6%)

(17-05-2026, 05:40 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I don't follow you. What do you mean by 'prefix follower', 'midfix follower', 'suffix follower'? Ie. using the 'f-ledger table' shown in your previous post: if I have 'fa', what can follow the 'a'? 'r,t' (prefix followers) or 'c,i,l,r,n' (midfix fololowers) or 's,t' (suffix followers)? And why?

Always using your ledger, and the same columns I think you used, what forbids me from creating/validating 'fandao', which does not look much to be a Voynich-style word?

Follower may be a bad choice except for the prefix follower. So it's like this.

Your alphabet is on the left column. That's what a word can start with, any letter in the Voynich alphabet. The prefix follower. Only certain letters can come after that specific prefix. For example. If you start your word with the letter A, the only 2 options you have for the next letter are R and T. If you start with the letter F, only A can be the next letter, it follows the prefix. Once you have FA... then you can look at midfix. A can have C, L, I, N, R as a letter that follows it as a midfix. If you select C then you have FAC. Next, look at what midfix you can add after the letter C. H, K or T. Keep adding midfixes until you're ready to end the word. When you are ready, you'll see that the letter Y has D, E, H, K, L, O, R, S, T as possible suffixes.

So yes,
Fachys
Fachyd
Fachye
Fachyh... etc... are all valid words.

That ledger I showed a picture of is not complete. It was designed to be short and simple. Whether you could create the word fandao depends on whether you could follow those letters in the ledger. If you could, it would be a 'legal' word. Does that mean it's a Voynich word? No. Not one you're familiar with. The purpose of the ledger is to validate mutations with known letter combinations the Voynich actually uses.

(17-05-2026, 05:40 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Yet again I don't understand. If you find 'fachyas', why does it matter that 'fachas', without the 'y' is a valid word? And, are really 'fachyas'/'fachas' two valid Voynichese words? There's only a word in the full text which starts with 'fach': fachys.

The ledger is merely saying that if you want to mutate the word fachyas into fachas then there are examples in the Voynch where Y follows the letter H and if you remove the Y, there are examples in the Voynich where A follows the letter H. If you tried to create Fachvas, the ledger does not allow that. Which means, nowhere in the Voynich does V follow the letter H. So, when MUTATING words, or creating random new ones, it constrains the choices to what it knows the Voynich does and doesn't allow really wierd words to appear.

And the ledger will work for any language. I could load it up with emoji's. It's constraining the choices made when deciding which letter CAN come after a specific letter. That does not mean it can reproduce a language.

I have a fully built Scribe 1 ledger as a json on the repo. Have a look.

(17-05-2026, 05:52 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Can you give me the EVA for that word. searching for ddss shows nothing in either Takahashi or ZL.

In ZL it's:

<fRos.14,@L0> <!2:11>[d:?]dsschx

However looking at it I'd say it's more like ddssShx. In the copy+mutate theory how did this word come to be?

[attachment=15623]

(17-05-2026, 05:52 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Reason being is in the transcriptions I used it's identified as a splat. I specifically strip any splat word out of my working corpus because there are at least as many ways to define splats as there are splats.

What is the definition of a splat here?

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19