The Voynich Ninja

Pages: 1 2 3

Dear all,

I have made a template driven voynich text generator which passes the statistic tests, including my own 4 signature tests which are quite demanding. Is it something that would be worth sharing, or template generators are already well described and known? I couldnt find much litterature except the groundbreaking work from Stolfi, but I dont know if he made an actual generator. Thanks for your inputs

I just tried out a text generator.
You are not allowed to view links. Register or Login to view.
Why don't you just give us a detailed overview of your project ?

Ok here is an overview:

1. Grapheme normalization
Each Voynich word is first normalized:

Replace
ch → C
Replace gallows letters (t, k, p, f) →G

Then map characters into a reduced alphabet:

C,G stay unchanged
y → Y
q →Q
vowels → V
all other letters → X

This produces a skeleton string encoding coarse word shape.

Example for word "chekcheor":
Step-by-step:

ch → C
k → G
vowels → V

Resulting skeleton: chekcheor → CVGCVVX

2. Frame extraction from skeletons
Each skeleton is split into:

ONSET: substring before the first vowel (V)
CODA: substring after the last vowel
The central vowel region is discarded

This yields a frame: frame=(ONSET,CODA)

Example using skeleton CVGCVVX:

Structure:

first V at position 1
last V at position 5

So:
ONSET = C
CODA = X

Frame = (C, X)

3. Frame induction from corpus
All observed (ONSET, CODA) pairs are extracted from the corpus:

frames are defined as distinct observed pairs
frames are ranked by frequency
the most frequent frames form the model’s state space

4. Template association with frames
Each corpus word has:

a skeleton (full structural pattern)
a derived frame (via ONSET/CODA split)

Thus:

each frame is associated with a set of observed skeleton templates
this association is induced statistically from the corpus

Example for frame (C, X)

Observed templates:
CV
CVX
CVGCVVX (chekcheor)
...

5. Markov model over frames
A Hidden Markov Model is trained where:

hidden states = frames
transitions = empirical frame-to-frame transitions observed in lines
additional biases exist for:
- line start states
- line end states
- line length distribution

6. Frame sequence generation
At generation time:

a frame is sampled from the Markov chain
subsequent frames are generated via transition probabilities

Example generated path: (C, X) → (V, X) → (V, X) → (C, X)
Each step is chosen by Markov probabilities.

7. Template emission conditioned on position
Each frame emits a template according to position-dependent empirical distributions:

initial position distribution
middle position distribution
final position distribution

These distributions are learned from observed corpus frequencies of templates within each frame context.
The emission selects existing templates only.

8. Surface word realization (lexical sampling)
Each emitted template indexes a bucket of attested corpus words:

the final output word is sampled from this set
no new words are constructed at this stage

The generator is trained separately on Currier A and Currier B, producing different texts for these two languages.

Example (Currier A):
lchdy qokol olo chekcheor chol
tchor chol qoaiin qoteol cho dy
tchory
chopchal chody tos kcheey ainy chaiin
ydar cho cholkol cheeykeem
kcho dair dar daiin chok chy daiin
alam
toleechal daiin chol saiin
oldal cheeky chol chotchy keol chan okar cholfy
daiin chory daiiin ches aiin ykeol
choly dor choo chody aiin olchy qoty
ykeey al otoldy saiin choek chos chor
ol
dar daiin damo chol alaiinom okoldy daiin
daiin okaraiin ols ldy ykoaiin dain cheokeey
okeeor cheos qokchey qokod chetchy oeeeb
chkor tchor chod otchol chaiin chain cho
otcheey pykchy okoldg ytchom otoldy ol
lchal choky dar cheky chor ykaiin dal
qotchy qokeol qotomody chear chey
kochor olkor chol chor chor chaiin kchy
schey qopchy kchol olchor olaiin chokchol oty cheodar
dchokchy chotchy daiin dal

This isn't News. Thread moved.

(05-06-2026, 10:10 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Example (Currier A):

No sh? Did you convert both sh and ch to C for training, then convert C back to ch?

(05-06-2026, 11:49 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(05-06-2026, 10:10 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Example (Currier A):

No sh? Did you convert both sh and ch to C for training, then convert C back to ch?

You have sharp eyes Smile

sh is considered CH, and cth is tch (wame.for.f.p k) for reasons to be.explained in a dedicated.info theory thread.

I made a pseudo-Voynich generator some time ago. No idea of how it compares to yours, but it reproduced character distribution and bigrams distributions; it reproduced character, bigrams and word entropies, words lengths distribution over the text (Zipf's law) and the binomial words length distribution over the dictionary. It produced a very similar dictionary, with just slightly less hapax legomena (and with some words which I found offensive, the most frequent being 'po'). It was just a Markov chain generator, about 1200 states and ~8000 transitions. In the end I decided it was a not-that-interesting approach.

Sample text:

Quote:shey cheol otey kir okeedy tail shyy olshedy cheeodam oror al oty eeey odaiin loiiin qo okshey lchedy oty chol oteeody chory chol shedefy oteas cheodl cthochs shey qokeeody teedy oain sheey odchedy ykeeo okeo otal opcheold sheckhy shedy qokeedy atam shedy oteol lodaiin cheykches ykeedy or okeedy chol qol cheodaiin choldain otedy okeotos okchor rarod ykaiin rol osheeo dal otal qokaiin dar otoky oychey ctholechey olkaiidy okaiin oteeey cheey lsheo opchedy ar okar cheey chol cthol qokeeo chal okar otshedy y lkchokchol chcfhor chy otal chor chcthy okeeody kchos kaldy rchos qoky char qokeey oekeo chekary ykaiin al okeodaiin lkaiin otain chy qokaiin qod chey dain yair qokain qopchdy dal chcth orearary kchodaiin olky ctharoly shedy kos osary okeedy otaiin psho chokchey qoky chedy cheor sal dalsheody tol ail ol ykeeody otar chykchy otar chedy fchedy dar lshedy chedaiin qokaldy qoor poldaiin dy alo qockhey chedy okaiir cheocphey qokshdy qoteeedy r chqeos o chocfhor aky chean otain sar okeeody shey or darar ctheol chcthy qolkal otalor okeaiin cheodaiin or daiin shey aiin chol om shedydy shedy dolkeeody qokchedy daiin cheey sodaiin aiin oeeshy qokeey kolram shokocfhy saiir shar s cthy cheey ol opchd tol ol arorchy qocheoty dchor or lolkar ol sar qokeedy okedy qotdaiin okedy daiin otey cholpchedy y daiir dykeey sheealy chor chey pchocthy otykechy y or aiiin qofol shody daiin otaiin qotedy chytchy daiin daiir qotaiin qokeey chor dar qokaiin lkain dlshedy ykeed okeeey shedy checkhy aiin ar chedaiin keched chol roldalol kol rom chan lkeedy qokain aiin okeey oteedy qokeeey otair otody otaiin odal yteol okeod ar okedy qokeedy okeody doltol otaiin otolchey shokey qokeedy yky ctholkary qokal okeokeokeody qokar

(05-06-2026, 10:10 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.lchdy qokol olo chekcheor chol
tchor chol qoaiin qoteol cho dy
tchory
chopchal chody tos kcheey ainy chaiin
ydar cho cholkol cheeykeem
kcho dair dar daiin chok chy daiin
alam
toleechal daiin chol saiin
oldal cheeky chol chotchy keol chan okar cholfy
daiin chory daiiin ches aiin ykeol
choly dor choo chody aiin olchy qoty
ykeey al otoldy saiin choek chos chor
ol
dar daiin damo chol alaiinom okoldy daiin
daiin okaraiin ols ldy ykoaiin dain cheokeey
okeeor cheos qokchey qokod chetchy oeeeb
chkor tchor chod otchol chaiin chain cho
otcheey pykchy okoldg ytchom otoldy ol
lchal choky dar cheky chor ykaiin dal
qotchy qokeol qotomody chear chey
kochor olkor chol chor chor chaiin kchy
schey qopchy kchol olchor olaiin chokchol oty cheodar
dchokchy chotchy daiin dal

I think the number of rare or non attested glyph combinations here is a bit high:
pykchy (I don't think 'pykc' exists in the MS)
cheeykeem (I don't think ykeem is in the MS)
toleechal (appears once as toleeshal in the MS)
oeeeb (appears once)
qotomody (appears once)

I don't know what the idea behind this generator is, if it was supposed to produce believable Voynichese, it needs some tweaking.

(05-06-2026, 12:30 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I made a pseudo-Voynich generator some time ago. No idea of how it compares to yours, but it reproduced character distribution and bigrams distributions; it reproduced character, bigrams and word entropies, words lengths distribution over the text (Zipf's law) and the binomial words length distribution over the dictionary. It produced a very similar dictionary, with just slightly less hapax legomena (and with some words which I found offensive, the most frequent being 'po'). It was just a Markov chain generator, about 1200 states and ~8000 transitions. In the end I decided it was a not-that-interesting approach.

Sample text:

Quote:shey cheol otey kir okeedy tail shyy olshedy cheeodam oror al oty eeey odaiin loiiin qo okshey lchedy oty chol oteeody chory chol shedefy oteas cheodl cthochs shey qokeeody teedy oain sheey odchedy ykeeo okeo otal opcheold sheckhy shedy qokeedy atam shedy oteol lodaiin cheykches ykeedy or okeedy chol qol cheodaiin choldain otedy okeotos okchor rarod ykaiin rol osheeo dal otal qokaiin dar otoky oychey ctholechey olkaiidy okaiin oteeey cheey lsheo opchedy ar okar cheey chol cthol qokeeo chal okar otshedy y lkchokchol chcfhor chy otal chor chcthy okeeody kchos kaldy rchos qoky char qokeey oekeo chekary ykaiin al okeodaiin lkaiin otain chy qokaiin qod chey dain yair qokain qopchdy dal chcth orearary kchodaiin olky ctharoly shedy kos osary okeedy otaiin psho chokchey qoky chedy cheor sal dalsheody tol ail ol ykeeody otar chykchy otar chedy fchedy dar lshedy chedaiin qokaldy qoor poldaiin dy alo qockhey chedy okaiir cheocphey qokshdy qoteeedy r chqeos o chocfhor aky chean otain sar okeeody shey or darar ctheol chcthy qolkal otalor okeaiin cheodaiin or daiin shey aiin chol om shedydy shedy dolkeeody qokchedy daiin cheey sodaiin aiin oeeshy qokeey kolram shokocfhy saiir shar s cthy cheey ol opchd tol ol arorchy qocheoty dchor or lolkar ol sar qokeedy okedy qotdaiin okedy daiin otey cholpchedy y daiir dykeey sheealy chor chey pchocthy otykechy y or aiiin qofol shody daiin otaiin qotedy chytchy daiin daiir qotaiin qokeey chor dar qokaiin lkain dlshedy ykeed okeeey shedy checkhy aiin ar chedaiin keched chol roldalol kol rom chan lkeedy qokain aiin okeey oteedy qokeeey otair otody otaiin odal yteol okeod ar okedy qokeedy okeody doltol otaiin otolchey shokey qokeedy yky ctholkary qokal okeokeokeody qokar

Interesting! Was it Currier A or B? In any case, it fails 2 of the 4 signatures:

FOUR-SIGNATURE EVALUATION
───────────────────────────────────────────────────────

Sig1 E->S: 66.2% [66.2, 66.2]
VMS ref: 64.3% [59.1, 68.6]
Criterion: range [50, 79]%

Sig2 Bilateral (full corpus):
Generator: Se=0, Ee=0 → NO
VMS ref: Se=3, Ee=7 → YES
Criterion: must have bilateral (ref has it)
(chunks: gen=0% ref=90%)

Sig3 MI: 2.0376 [2.0376, 2.0376]
VMS ref: 0.4980 [0.4759, 0.5241]
Criterion: range [0.249, 0.996]
Quality: overshoot (4.1x) (4.09x reference)

Sig4 Shape: Zipfian R²=0.918 CV=0.97
VMS ref: Intermediate R²=0.805 CV=1.53
Criterion: non-Plateau (ref=Intermediate)

───────────────────────────────────────────────────────
VERDICT (calibrated to Currier B):
───────────────────────────────────────────────────────
E->S : ✓ PASS [range [50, 79]%]
Bilat : ✗ FAIL [must have bilateral (ref has it)]
MI : ✗ FAIL [range [0.249, 0.996]]
Shape : ✓ PASS [non-Plateau (ref=Intermediate)]
───────────────────────────────────────────────────────
Joint (adaptive): 2/4

Reference values:
VMS Currier B: E->S=64.3% MI=0.4980 Bilateral=YES Intermediate

★ 2/4 signatures passed

Would it be possible for your generator to output its text in IVTFF format? And if it could generate 20 pages and about 7000 words total I would then be better able to compare it with quire 13.

Pages: 1 2 3

Labyrinthinesecurity

bi3mw

Labyrinthinesecurity

Koen G

nablator

Labyrinthinesecurity

Mauro

oshfdk

Labyrinthinesecurity

dashstofsk