The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

(21-05-2026, 03:48 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.he surely would have produced a very different book

Most likely he did not write the whole manuscript before trying to find a buyer for it. The scenario I can imagine is that the sections were written as separate pieces of work, with each piece following on from the success of the previous piece. Then sometime after, and perhaps by someone else, the sections were bound into one book.

But also once his efforts started to get rewards he felt perhaps he could relax his standard and not worry too much about accuracy of the visual content.

I tried to say something about this before. See

You are not allowed to view links. Register or Login to view.

(21-05-2026, 07:01 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Most likely he did not write the whole manuscript before trying to find a buyer for it. The scenario I can imagine is that the sections were written as separate pieces of work, with each piece following on from the success of the previous piece. Then sometime after, and perhaps by someone else, the sections were bound into one book.

But also once his efforts started to get rewards he felt perhaps he could relax his standard and not worry too much about accuracy of the visual content.

I tried to say something about this before. See

You are not allowed to view links. Register or Login to view.

If you look at the chart I posted earlier here: You are not allowed to view links. Register or Login to view.

Your theory could be true. Scribe 1 wrote most of the herbal. Any page not written by Scribe 1 is on a separate sheet that usually wraps the outside of a quire. And Scribe 1 wrote most of the Baneo section, but not all of it. Again, Scribe 3 sheets explain the ones not written by Scribe 1. So, it's entirely possible the book was just herbal and some Baneo to start.

But, it's not just the conventional sections. Lisa Fagan Davis has multiple scribes working on the same sections and sometimes, on the same sheet. She stated in her paper that she doesn't believe that the multiple scribes was a temporal variation where one scribe wrote it at multiple times. Her belief is that it was multiple scribes in perhaps a workshop. So, you'd have to account for that as well.

(21-05-2026, 04:36 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I suppose even without physically looking at past pages, many people have very good visual memory and even spontaneous meaningless writing may repeat past patterns easily, so in a sense all kinds of pattern following include copy+mutate part.

You mean that the Scribe would not be given a detailed algorithm, but only a fuzzy instruction "copy for a while, mutate some words, then start copying from some other place, and repeat until you got enough text. Every now and then, break a paragraph."

The output produced by a Scribe with such vague instructions would turn out to be both a lot more erratic and a lot more repetitive than what we see in the VMS. The interval between restarts, the frequency of mutations, and especially the kinds of mutations would vary from page to page. As the Scribe got tired, he would probably shorten the restart interval, so that he would not need to look far away for the text to copy. His mutations would not preserve the structure of words; he may generate many words with two or more gallows, with three d's, etc - and then copy those words over and over.

All the best, --stolfi

(21-05-2026, 10:39 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 04:36 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I suppose even without physically looking at past pages, many people have very good visual memory and even spontaneous meaningless writing may repeat past patterns easily, so in a sense all kinds of pattern following include copy+mutate part.

You mean that the Scribe would not be given a detailed algorithm, but only a fuzzy instruction "copy for a while, mutate some words, then start copying from some other place, and repeat until you got enough text. Every now and then, break a paragraph."

I'm not sure you have quoted the right part? But if you have, in the quoted section I was just saying that generic copy+mutate is indistinguishable from having good visual memory and copying past patterns from memory. And then I argued that copy+mutate research makes sense if there was some actual set of rules, otherwise it's very hard to prove what exactly was going on.

I also wrote previously about the possibility of visual approach to text generation, just treating it as purely visual pattern of strokes and following it without any instructions at all, maybe you are referring to that?

(21-05-2026, 01:17 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.
(20-05-2026, 11:37 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Let alone why a great majority of possible modifications are never made.

No, the manuscript never explored every possible word combination. A realistic copy-mutate system would stay conservative, reusing active word families and nearby variants instead of wandering randomly through all legal forms.

Why would a realistic copy-mutate system stay conservative? There is nothing that dictates this.
The number of rules observed over tens of thousands of words are quite complex, and they are indeed rules.

This is, in a way, backwards logic.
We see that the word variations are very strict. Therefore, if the text was generated by modifying previous words, it would have to have followed strict rules. That is the correct direction of the logic.
There is no reason to assume that there would be very strict rules (which are then broken somewhat gradually).

EDIT:
Let's do some rough counts.
The word chedy could be considered to have four characters.
Limiting to edit distance 1:
Each of these could be changed into another, leading to 4 times, say, 20 options.
Each of these could be deleted, leading to 4 more.
A new character could be added in each of 5 slots, so 5 times 20 more.
6 pairs could be swapped (not sure if that counts as edit distance 1).
We are close to 200 alternatives.
Possibly 10 exist.

We can consider two alternative methods for a creation of a meaningless text using word permutations.

Method A:
first, a vocabulary is set up using word patterns and their variations
then, a text is composed by somewhat aribitrarily picking words from this vocabulary/dictionary

Method B:
a text is generated by creating new words from previous ones 'on the fly'
then, the resulting vocabulary is the collection of all these words

It should be clear that the very limited set of allowed permutations much better fits with method A than method B.

(21-05-2026, 09:12 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... doesn't believe that the multiple scribes was a temporal variation
... it was multiple scribes in perhaps a workshop.

Anyone intending to create a hoax manuscript would have done it in secret and not in some manuscript workshop factory. Under the hoax hypothesis the writer was not bound by any contract of work, was under no obligation to do it quickly, or indeed to do any of it at all. He wrote as much or as little as he wanted, whenever he wanted. Perhaps also only being able to do it in his spare time.

Section by section over a length of time is reasonable and can explain the different 'topics', handwriting styles, language clusters, quality of drawings.

(21-05-2026, 06:33 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 03:48 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.why did he make Voynichese so unlike European languages?
A fabricated manuscript in a European alphabet would be less likely to succeed.

Okay for the invented script, but I am referring to the other features of the language -- like the short words with complex structure, and the repetitiousness. Features that Europeans at the time would not have seen in any language, known or unknown; including languages with "unknown" scripts like Ethiopian, Armenian, Georgian, Sanskrit... Those features would have made people think of nonsensical gibberish -- as they still do now.

Quote:The scenario I can imagine is that perhaps some similar manuscript in the Arabic script was sold to someone who did not understand the text but who had the luxury and self importance to want to buy and hold such a manuscript, a rarity because it was not in a European language, and who was happy to pay a premium for it. [...] Perhaps this purchase then came to the attention of some opportunist who then had the idea to ‘invent’ a manuscript, claim it to be a work from some distant land, with the intention of offering it to some similar person of importance for a similar premium.

But that is an imagined scenario. Is it likely to have happened?

There was no lack of manuscripts in foreign languages and foreign scripts circulating in Europe at the time. That hypothetical collector of unreadable manuscripts would have quickly gone bankrupt...

All the best, --stolfi

(21-05-2026, 05:13 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.This is the closest chart I have to a less abrupt transition and you may remember it from my work on <ed>. It's showing the switch between scribe 1 and <ho> and scribe 2 and <ed>. And yes, the pages are reordered but, the background is shaded according to Currier A and B. So there is a gradual change but...

Thanks for the chart. But suppose that there are two sets of pages: Set A where ed is never used and the frequency of ho varies randomly and uniformly between 0.05 and 0.55, and set B where the frequency if ho varies randomly and uniformly between 0.05 and 0.10, while that of ed varies randomly and uniformly between 0.00 and 0.40.

If you plot the frequencies with the pages in some random order, but with all A pages first then the B pages, what you will see is just that: two different random distributions.

But if you sort the pages by decreasing frequency of ho or of the ratio ho/ed, the plot would look just like your plot. You will see a gradual transition from (ho = 0.55, ed = 0.00) to (ho = 0.05, ed = 0.40). A transition that does not exist in the data, and was created wholly by the sorting.

All the best, --stolfi

(22-05-2026, 11:48 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But that is an imagined scenario

Imagined.

But the fact is that Arabic scripts were around at that time and might have been the inspiration for the hoax.

(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Why would a realistic copy-mutate system stay conservative? There is nothing that dictates this.
The number of rules observed over tens of thousands of words are quite complex, and they are indeed rules.

Code:
on of beein fascinatng tht etz

come s been hep on pto been

dof tanswer pitc talfing hnooked onv talking

at beein he handsom sound

oin itc we otf yit

matpers opf iot oin he wel

pof he lnoo slep talking hye pipwth hyey talfing

onu heo s njoo on

fyor talfing of soce he

on s tof hem caymet tot of

nto no sall yof her sernness

he talking her he i golad dicussion

direaction dof ato opflc ree

That is output from my generator using 150 words from Dracula as the seed and creating an English ledger. Note: I did not ever intend to create a pseudo English generator so it's FAR from perfect, as in WAY heavy on length 3 words. But, the method is the same. The generator is NOT exploring all possible permutations equally. It is staying trapped inside a conservative local ecology built from nearby forms. That is why:

talking → talfing
discussion → dicussion
fascinating → fascinatng
handsome → handsom

appear repeatedly while thousands of other theoretical edits never appear.

The system is trying to maintain textual plausibility and family continuity, not maximize combinatorial exploration. The Voynich appears to behave similarly. The existence of hundreds of theoretical edit possibilities does not mean a human production process would actually use them.

(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.We see that the word variations are very strict. Therefore, if the text was generated by modifying previous words, it would have to have followed strict rules. That is the correct direction of the logic.
There is no reason to assume that there would be very strict rules (which are then broken somewhat gradually).

I disagree that strict behavior necessarily implies a huge explicit rule system. Strong local reinforcement alone can produce highly constrained output over long runs.

The important distinction is between:

all theoretically possible mutations,
and
the tiny subset that preserve plausibility.

Most mutations simply stop “looking right” relative to nearby forms, so they are naturally filtered out without the scribe consciously following hundreds of formal rules.

(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.EDIT:
Let's do some rough counts.
The word chedy could be considered to have four characters.
Limiting to edit distance 1:
Each of these could be changed into another, leading to 4 times, say, 20 options.
Each of these could be deleted, leading to 4 more.
A new character could be added in each of 5 slots, so 5 times 20 more.
6 pairs could be swapped (not sure if that counts as edit distance 1).
We are close to 200 alternatives.
Possibly 10 exist.

We can consider two alternative methods for a creation of a meaningless text using word permutations.

Your estimate for the theoretical space pretty accurate. I calculated 203 possible variants. But your ~10 alternatives is underestimating. I found ED1 variants to be 54 in the Takahashi transcription and 53 in yours. There's the top variants using ch as a single character.

Variant	Count
shedy	434
chey	351
cheey	183
chdy	145
lchedy	115
chody	92
cheody	90
cheky	64
cheedy	59
kedy	47
tedy	40
pchedy	35
tchedy	33
dchedy	28
chety	22
kchedy	22
ched	18
ychedy	11
rchedy	10
ochedy	10

And they're not evenly distributed. What I'm seeing is conservative reinforcement of specific variations. So why does the manuscript repeatedly collapse into narrow local families instead of exploring the full mutation capability? To me, that is exactly what a constrained copy/mutate process predicts. The scribe is not asking “what are all possible edits?” The scribe is asking, consciously or not, “what still looks like it belongs to this family of words?” And, I think that's exactly what that table shows. That question naturally favors a few reinforced variants and leaves most theoretical variants unused.

As for your methods A and B, I do not think they are as distinct as you suggest. A conservative copy/mutate process naturally creates a working vocabulary over time through local reinforcement. The vocabulary does not need to be fully precomputed in advance. Your edit-count argument assumes all theoretical ED1 mutations are equally attractive choices. They are not. Most mutations immediately break local family continuity and stop looking plausible. IF the goal were to create something that looked like a language, combinatorial freedom would explode it into gibberish.

Here's my Voynich generator stats for a 100 page run:

8248 tokens
2560 vocabulary items
1601 hapax
0 ledger-invalid forms

The Voynich comparison sample contained:

8519 tokens
2436 vocabulary items
1642 hapax

So despite enormous theoretical mutation space, the system did not explode into every possible word because it was constrained by the ledger. And it reproduced a vocabulary and hapax rate comparable to the Voynich without specifically being coded to do so. It repeatedly collapsed into a narrow, active ecology of locally reinforced forms.

That behavior fits Method B without being instructed to do so.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19