Ok, so the lack of comments so far tells me that my first post was either way too heavy on research-paper mode and too light on explanation, or I have stunned everyone into silence with my brilliance. I strongly suspect it’s the former.
So here’s the simpler version.
I think I’ve found strong evidence for a copy/mutate system inside the Voynich. This overlaps with a lot of Torsten Timm’s work, but I think it goes further. Most copy/mutate theories look only at nearby previous pages. There is definitely evidence for that. But in a real production environment, that would mean the scribe constantly flipping pages around to use as references. What I’m seeing instead is evidence that the copying and mutation operated at the SHEET level. In other words, the scribe could have had one or two sheets propped up nearby and repeatedly pulled words from them while writing a new page. The current page itself then becomes an additional local source. So the workflow becomes:
- copy a word from a source sheet
- slightly modify it
- copy it again
- mutate it again
- reuse words already written on the current page
- repeat hundreds of times
The interesting part is what happens when you analyze the results.
Here’s You are not allowed to view links.
Register or
Login to view. from the Zandbergen/Landini transcription.
FOLIO: You are not allowed to view links.
Register or
Login to view. QUIRE(S): 3 (82)
SHEET(S): 4 (82)
SCRIBE(S): 1 (82)
HAND(S): A1 (82)
CURRIER LANGUAGE(S): A (82)
TOTAL WORD INSTANCES LEN>=3: 75
UNIQUE TOKENS LEN>=3: 60
CORE TOKENS TESTED: 24
SAME-FOLIO ED1 DERIVED TOKENS: 51
PREEXISTING SOURCE CANDIDATES SEARCHED: 2153
ALL PRIOR-FOLIO MATCHES FOUND: 251
SOURCE-SHEET COVER
CORE TOKENS WITH AT LEAST ONE PRIOR-FOLIO MATCH: 24
EXACT SMALLEST SHEET COUNT FOUND: 4
COVER METHOD: exact full cover within max_sheets=6
SELECTED SHEETS:
quire 1, sheet 1: covers 17 core tokens; adds 17
quire 2, sheet 2: covers 17 core tokens; adds 4
quire 1, sheet 4: covers 15 core tokens; adds 2
quire 1, sheet 3: covers 14 core tokens; adds 1
COVERED CORE TOKENS / COVERABLE CORE TOKENS: 24 / 24
COVERED CORE TOKENS / TOTAL CORE TOKENS: 24 / 24
UNCOVERED COVERABLE CORE TOKENS: 0
RETAINED SOURCE SHEETS AFTER CORE PRUNING
q1s1
q2s2
SHEET CLASSIFICATION
q1s1: +17 [CORE]
q2s2: +4 [SECONDARY]
q1s4: +2 [RESIDUE]
q1s3: +1 [RESIDUE]
RESIDUE TOKENS (NEWLY ADDED BY RESIDUE SHEETS)
choldy | stripped choldy | f20v:1:8 -> cpholdy | stripped choldy | f4r:8:5 | q1s4 | ED0
shain | stripped shain | f20v:8:1 -> shain | stripped shain | f4r:3:7 | q1s4 | ED0
choraly | stripped choraly | f20v:8:2 -> chodaly | stripped chodaly | f3v:6:4 | q1s3 | ED1
RESIDUE CORE-RECHECK
RESOLVED ED1: choldy | stripped choldy | f20v:1:8 -> sholdy | stripped sholdy | f1r:1:9 | q1s1
RESOLVED ED1: shain | stripped shain | f20v:8:1 -> shaiin | stripped shaiin | f1r:22:2 | q1s1
RESOLVED ED2: choraly | stripped choraly | f20v:8:2 -> cthoary | stripped choary | f1r:3:6 | q1s1
RESIDUE SUMMARY
total: 3
ED0: 0
ED1: 2
ED2: 1
unresolved: 0
UNRESOLVED RESIDUE DIAGNOSTICS
none
The page contains:
- 75 total words
- 60 unique words
- 51 same-page ED1 derivations
The analyzer first removes the obvious same-page ED1 mutations to isolate the “core” vocabulary of the page.
It then searches the earlier manuscript for possible source matches and tries reducing those matches down into the smallest possible set of source sheets.
Result:
- 17 core tokens trace back to quire 1, sheet 1
- a few others come from small secondary sheets
- and even the leftover “residue” words eventually collapse back to q1s1 through ED1 or ED2
So despite the page looking diverse on the surface, most of the vocabulary ecology reduces back into a very small packet of source material centered around q1s1/f1r.
The manuscript may not have been generated from a giant hidden plaintext or complex cipher system at all. It may instead have been built recursively from a small rolling ecology of existing words, copied and slightly mutated over time while constrained by a simple glyph-adjacency ledger.
And that brings up part 2: the ledger.
It is basically a Voynich word validator.
Very simplified, it has 4 columns:
- the Voynich glyph
- allowed prefix followers
- allowed midfix followers
- allowed suffix followers
The ledger is built by looking at all the words on Scribe 1 pages and recording where glyphs are allowed to occur and what tends to follow them.
[
attachment=15621]
So if you wanted to create or validate a word:
Start with F.
The ledger says A is a valid prefix follower.
Then A allows C as a midfix.
C allows H.
H allows Y.
Y allows S as a suffix.
F → a → c → h → y → s
You just created a legal Voynich-style word.
For a mutation, you follow the ledger and
F → a → c → h → y → a → s
and
F → a → c → h → a → s
is a valid word.
The real ledger is more complicated because each follower also has weighting attached to it. Some transitions are common, some are rare. That is how my generator validates mutations when it creates them. Copy a word from a source sheet, mutate one glyph, check whether the result is still legal according to the ledger, and if it is, the new word survives. There is obviously more going on than this, but that is the basic mechanism. From what I’m seeing so far, most Scribe 1 pages appear reducible to copy/mutate behavior from a single dominant source sheet, occasionally two, and only rarely three.
So that’s the theory in a nutshell:
Copy/mutate using sheets as the source while constrained by a glyph-adjacency ledger.