The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

(22-05-2026, 05:07 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.No, I meeeeeeannnnnnnnnnnn youu youu youu youu youu need to youu need to make surrreeee it doesn't look stupid.

Does "qokeedy.qokeedy.qokedy.qokedy.qokeedy.ldy" look smart? What about "qokeedy.qotedy.qokeedy.qokeedy.qokeey.s,aiin.al"? Or "pShdy.ofchdy.qokedy.qoteedy.qokedy.qoltedy.qotedy.oky"?

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

(22-05-2026, 01:31 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.our estimate for the theoretical space pretty accurate. I calculated 203 possible variants. But your ~10 alternatives is underestimating.

Fair enough. When oversimplifying things, they become oversimplistic...

How many variants will cover for the most frequent 95%?
I guess the top ten won't do it, but it will be relatively close.

(22-05-2026, 01:31 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And they're not evenly distributed.

Very distinctly uneven. This is also important, though it will be hard to say more than that this indicates the existence of rules.

(22-05-2026, 10:40 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(22-05-2026, 05:07 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.No, I meeeeeeannnnnnnnnnnn youu youu youu youu youu need to youu need to make surrreeee it doesn't look stupid.

Does "qokeedy.qokeedy.qokedy.qokedy.qokeedy.ldy" look smart? What about "qokeedy.qotedy.qokeedy.qokeedy.qokeey.s,aiin.al"? Or "pShdy.ofchdy.qokedy.qoteedy.qokedy.qoltedy.qotedy.oky"?

Alcohol has this way of making you forget the rules, doesn't it? Having a Moosehead and some Jager so I speak from experience.

I incorporated don't look stupid rules based on Scribe 1. I didn't say that all the Voynich scribes subscribed to the same rules. You copy and paste 30,000 words and you might get a tad thirsty. Plus, those are <ed> so that has to be Scribe 2+. And they did not play by Scribe 1 rules. Scribe 1 seems to have been much more alchohol tolerant.

(22-05-2026, 10:47 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

You cheated Ed. You used capitals! That's a bunch of *cough* bull!

(22-05-2026, 11:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(22-05-2026, 01:31 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.our estimate for the theoretical space pretty accurate. I calculated 203 possible variants. But your ~10 alternatives is underestimating.

Fair enough. When oversimplifying things, they become oversimplistic...

How many variants will cover for the most frequent 95%?
I guess the top ten won't do it, but it will be relatively close.

(22-05-2026, 01:31 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And they're not evenly distributed.

Very distinctly uneven. This is also important, though it will be hard to say more than that this indicates the existence of rules.

For just chedy and it's variants?

Transcription	ED1 variants	Total ED1-neighbor tokens	Variants needed for 95%
Takahashi	54	1835	19
Zandbergen/Landini	53	1884	19

Top 10 variants of chedy coverage.

Transcription	Top 10 coverage
Takahashi	83.8%
Zandbergen/Landini	83.9%

After running the test to answer your question, I decided to run a much bigger test on the entire corpus without specifying a family. I haven't run this test before so this kinda amazes me.

Transcription	Vocabulary	ED1 Components	Largest ED1 Component	% Vocabulary in Largest	% Tokens in Largest
Takahashi	6813	700	6077	89.2%	97.8%
Zandbergen/Landini	7604	976	6589	86.7%	97.3%

Almost the entire Voynich running text belongs to one enormous ED1 mutation network. The manuscript does not explore all theoretical mutations equally. But the vocabulary remains mutation-connected almost everywhere.

(23-05-2026, 01:49 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Scribe 1 seems to have been much more alchohol tolerant.

I'm not sure hand 1 is much better at not looking stupid:

<f15v.5,+P0> otchor.chor.chor.ytchor.cthy.s
<f42r.13,+P0> qopor.shol.shot.shol.shol.daiin.dain.s.<->cheam
<f42r.21,+P0> shol.chol.chol.shol.{ct}oiin.{c'o}s.odan
<f44r.9,+P0> otchol.ol,dchckhy.qoky.qotchy.qokchy.qokyd
<f47r.7,+P0> schesy.kchor.cthaiin.chol.chol.chol.chor.{ck@191;h}ey
<f54r.10,+P0> tor.ol.dol.or.chol.chol.ckhol.okol.oky.<->ytchor.ol,koldy

(22-05-2026, 11:56 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.How many variants will cover for the most frequent 95%?

And I had codex put together a python file using my previous test that gives this all in visual representation.

Here's the chedy network of ED1.

[attachment=15699]

Here's the daiin ED1 network.

[attachment=15700]

And here's the big network. The group off to the right, I believe is a chckhy group and I'm thinking only has a weak link to the big network and not sure just yet what the isolated one on the left is.

[attachment=15701]

And here's the report the file produced. I ran it using Zandbergen/Landini.

Total vocabulary: 7582

Total tokens: 36105
Number of ED1 components: 976
Largest component size: 6567
Percent vocabulary in largest component: 86.61%
Percent tokens in largest component: 97.16%
Top 20 largest components by forms and token coverage:
1: 6567 forms ( 86.61%), 35080 tokens ( 97.16%), top: daiin, ol, chedy, aiin, shedy, chol, ar, or, chey, dar
2: 8 forms ( 0.11%), 8 tokens ( 0.02%), top: ckheckhy, ckhockhy, ckhocthy, cpheckhy, cphecthy, cpheocthy, cphocthy, cphoithy
3: 3 forms ( 0.04%), 3 tokens ( 0.01%), top: aiinod, aiinos, aiios
4: 3 forms ( 0.04%), 3 tokens ( 0.01%), top: oraiinam, otaiikam, otaiinam
5: 3 forms ( 0.04%), 3 tokens ( 0.01%), top: otolpchy, stolpchy, tolpchy
6: 3 forms ( 0.04%), 3 tokens ( 0.01%), top: pdsairy, polairy, posairy
7: 3 forms ( 0.04%), 3 tokens ( 0.01%), top: qotokody, qotomody, shotokody
8: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: chedaiphy, chekaiphy
9: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: cheedals, cheedls
10: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: cheoeees, cheoiees
11: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: choteosam, qoteosam
12: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: ctharad, ctharal
13: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: dchodees, fchodees
14: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: deeaiir, seeaiir
15: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: eeesal, eeesaly
16: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: lshodair, tshodair
17: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: ockhdar, ockhydar
18: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: okaifhhy, okaifhy
19: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: okairody, orairody
20: 2 forms ( 0.03%), 2 tokens ( 0.01%), top: okeearam, okeedaram
Outputs written with prefix: C:\Users\rod\Documents\Voynich\New Generator 2\mappings_ZLZB_ed1_network

And, I ran the same test on my generator output.

Total vocabulary: 5477

Total tokens: 23717
Number of ED1 components: 31
Largest component size: 5356
Percent vocabulary in largest component: 97.79%
Percent tokens in largest component: 98.72%
Top 20 largest components by forms and token coverage:
1: 5356 forms ( 97.79%), 23414 tokens ( 98.72%), top: chol, or, cthol, shol, chey, shey, cthey, shoy, chor, cfhol
2: 45 forms ( 0.82%), 122 tokens ( 0.51%), top: cphodaiils, cfhodaiils, ckhodaiils, chodaiils, cphadoiil, cphadaiil, cphdaiils, cphodaiil, cphodails, cthodaiils
3: 25 forms ( 0.46%), 90 tokens ( 0.38%), top: dlocta, dlocka, dlocty, dlocha, locta, ddlqocta, ddocka, dlcta, dloctas, dlqocta
4: 7 forms ( 0.13%), 16 tokens ( 0.07%), top: fsholrcho, sholrcho, fsholecho, fsolrcho, psholrcho, psholrco, solrcho
5: 4 forms ( 0.07%), 8 tokens ( 0.03%), top: foeochdor, doeochdor, foeochdon, foeochhor
6: 4 forms ( 0.07%), 6 tokens ( 0.03%), top: ssoepy, sesoepy, sseeky, ssoeky
7: 3 forms ( 0.05%), 11 tokens ( 0.05%), top: csteiin, csteion, csteiein
8: 3 forms ( 0.05%), 8 tokens ( 0.03%), top: ksheoldas, psheoldas, fsheoldas
9: 3 forms ( 0.05%), 4 tokens ( 0.02%), top: steddr, steddd, stedor
10: 2 forms ( 0.04%), 4 tokens ( 0.02%), top: cchadoiil, cchardoiil
11: 2 forms ( 0.04%), 3 tokens ( 0.01%), top: koldam, poldam
12: 2 forms ( 0.04%), 2 tokens ( 0.01%), top: kshoche, tshoche
13: 2 forms ( 0.04%), 5 tokens ( 0.02%), top: psdol, pasdol
14: 2 forms ( 0.04%), 2 tokens ( 0.01%), top: pshaiiram, rshaiiram
15: 1 forms ( 0.02%), 1 tokens ( 0.00%), top: ckochy
16: 1 forms ( 0.02%), 2 tokens ( 0.01%), top: cshofainy
17: 1 forms ( 0.02%), 1 tokens ( 0.00%), top: csokey
18: 1 forms ( 0.02%), 1 tokens ( 0.00%), top: dcpyr
19: 1 forms ( 0.02%), 1 tokens ( 0.00%), top: kolsheeo
20: 1 forms ( 0.02%), 2 tokens ( 0.01%), top: ohcthaiin

And, Bram Stoker's Dracula for comparison. Note the largest ED1 component is 33% of the vocabulary compared to Voynich and generated text of 97%+

Total vocabulary: 9246

Total tokens: 154418
Number of ED1 components: 4980
Largest component size: 3058
Percent vocabulary in largest component: 33.07%
Percent tokens in largest component: 81.62%
Top 20 largest components by forms and token coverage:
1: 3058 forms ( 33.07%), 126042 tokens ( 81.62%), top: the, and, to, of, he, in, that, it, was, as
2: 20 forms ( 0.22%), 725 tokens ( 0.47%), top: though, through, thought, brought, thoughts, caught, rough, ought, sought, wrought
3: 16 forms ( 0.17%), 91 tokens ( 0.06%), top: stopped, stepped, happen, lapped, slipped, happed, happens, mapped, napped, slapped
4: 11 forms ( 0.12%), 35 tokens ( 0.02%), top: bending, winding, bidding, finding, sending, winning, ending, binding, minding, blinding
5: 10 forms ( 0.11%), 25 tokens ( 0.02%), top: bringing, sinking, singing, cringing, bringin, clanging, clanking, clinging, ringing, wringing
6: 10 forms ( 0.11%), 88 tokens ( 0.06%), top: getting, sitting, setting, ittin, letting, settling, fitting, gettin, sittin, spitting
7: 10 forms ( 0.11%), 366 tokens ( 0.24%), top: helsing, telling, helping, rolling, tellin, lolling, tolling, yelling, yelpin, yelping
8: 9 forms ( 0.10%), 31 tokens ( 0.02%), top: rising, raising, hiding, riding, aiding, adding, aiming, padding, praising
9: 9 forms ( 0.10%), 70 tokens ( 0.05%), top: castle, bottle, battle, castles, cattle, battles, bottles, rattle, rattled
10: 9 forms ( 0.10%), 42 tokens ( 0.03%), top: breath, wreath, breathe, wrath, wreaths, breathes, wreathed, breadth, breathed
11: 7 forms ( 0.08%), 36 tokens ( 0.02%), top: fierce, piece, pieces, pierced, apiece, fiercer, pierce
12: 7 forms ( 0.08%), 109 tokens ( 0.07%), top: looking, lookin, licking, booming, mocking, locking, looming
13: 7 forms ( 0.08%), 42 tokens ( 0.03%), top: falling, willing, calling, killing, callin, chilling, filling
14: 7 forms ( 0.08%), 38 tokens ( 0.02%), top: handed, landed, candle, handle, candles, handled, handles
15: 7 forms ( 0.08%), 30 tokens ( 0.02%), top: putting, cutting, shutting, cuttin, cuttings, jotting, jutting
16: 7 forms ( 0.08%), 9 tokens ( 0.01%), top: depite, despite, deity, depity, deputy, despise, despises
17: 7 forms ( 0.08%), 11 tokens ( 0.01%), top: shipping, drooping, dropping, dipping, dripping, shopping, tripping
18: 7 forms ( 0.08%), 41 tokens ( 0.03%), top: edge, edges, pledged, pledge, ledge, ledger, sledge
19: 7 forms ( 0.08%), 11 tokens ( 0.01%), top: humble, fumbled, humbly, mumbled, stumbled, tumble, tumbled
20: 6 forms ( 0.06%), 21 tokens ( 0.01%), top: slightly, brightly, lightly, rightly, nightly, tightly

Now, to be completely honest, here's where my generator is not getting it right. It may be over-regularizing morphological patterns. Plus the low ED1 component count in my generated text is kinda expected. The generator tries to not create new words out of thin air. As a result, the combinatorial rules are so regular that almost any token can be nudged into any other through small edits, which the real Voynich doesn't quite do. Again, I didn't create the generator to pass this test but, with a bit of refinement, I think it can.

My generator

[attachment=15707]

Dracula

[attachment=15708]

I have csv files to go with this if anyone wants to dig deeper, just let me know.

(23-05-2026, 01:37 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And I had codex put together a python file using my previous test that gives this all in visual representation.

Nice!

Could you please explain those graphs a bit more? How should we interpret the lengths of the lines and the sizes of the nodes?

All the best, --stolfi

(23-05-2026, 02:25 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(23-05-2026, 01:37 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And I had codex put together a python file using my previous test that gives this all in visual representation.

Nice!

Could you please explain those graphs a bit more? How should we interpret the lengths of the lines and the sizes of the nodes?

All the best, --stolfi

The node sizes are proportional to word frequency. Larger nodes are words that occur more often. The lines represent ED1 relationships. Two nodes are connected if one form can be transformed into the other by a single insertion, deletion, or substitution. The actual physical lengths of the lines are not meaningful by themselves. The graph layout uses a spring-force algorithm that tries to pull highly connected regions together while pushing weakly connected regions apart. So clusters that appear close together are generally more densely interconnected through ED1 relationships, while detached clusters have relatively few connections to the rest of the network.

And I'll admit, that's an AI description of the chart and I worked with codex to come up with a mathplotlib chart that made it look readable. The first chart it tried looked like a giant cat hairball with everything in a circle and lines going everywhere. But, the data is coming from one of my mappings files or a gutenberg text.

So, English fractures into many disconnected morphological islands. My generated text is more center clustered than English but still has the outlying islands. Voynich instead forms one overwhelmingly dominant connected mutation network. Essentially, my generator is getting some of it right without trying, but not all of it. I suspect I haven't figured out enough scribal 'habits' yet. But, I think this shows the basic method the generator is using is 'plausible', which was my main goal.

Screenshot of the python file so you can tell it's not completely ai slop.

[attachment=15716]