The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

(19-05-2026, 12:09 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.How do we know this?

We do not know it. I am just suggesting it. I seem to recall someone somewhere sometime saying something about it showing in places where the writer had to refill the pen with ink and then would write for ~10 words before the pen started to get dry again. So that suggests 10 words of continuous writing. Had there been a big pause the writer would have put the pen back into the pot and the writing then after would have started dark again.

But the main point I wanted to highlight was that the writer was not following any sort of algorithm, was not consulting code tables or ledgers or throwing dice. I believe people are thinking too hard. The method is probably more simple.

(19-05-2026, 04:39 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.But the main point I wanted to highlight was that the writer was not following any sort of algorithm, was not consulting code tables or ledgers or throwing dice. I believe people are thinking too hard. The method is probably more simple.

Oh, you are very likely correct. Just like we don't need to consult ledgers or toss dice to write English, the Voynich scribes probably didn't either once they became fluent in the system. After enough repetition, writing becomes procedural memory and you stop consciously thinking about rules and simply write. That is really the point of the ledger model. Not that the scribe sat there actively consulting a table for every token, but that the underlying structure had to be learned somehow. Children learning English don't carry spelling charts forever either, but those constraints still exist beneath the writing process. In the Voynich case, the system is actually fairly compact: no capitalization or punctuation, four gallows, and a constrained glyph inventory. Once internalized, it likely flowed naturally.

When I describe a ledger, I'm not necessarily describing the moment-to-moment writing process of an experienced scribe. I'm describing the minimal structural framework required to teach, constrain, and reproduce the system consistently. After enough practice, the ledger probably disappears into muscle memory. So, a learned single sheet ledger system like what I'm suggesting follows the flow of writing that the paleography suggests. Now what could be more simple that the way you just wrote that reply?

(19-05-2026, 05:02 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... scribes probably didn't either once they became fluent in the system. After enough repetition, writing becomes procedural memory and you stop consciously thinking ...

That is what I think also.

Also it is highly probable that the sections of the manuscript were written at different times. And because it is also highly probable that the writer was using the alphabet no-where else the gaps of time might have been sufficient for the writer to lose his momentum, lose just enough fluency for him to have to re-adapt to writing anew. The language in each new section then becoming slightly different, giving us the separate language clusters we can see, the most prominent being A and B.

(19-05-2026, 04:39 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.But the main point I wanted to highlight was that the writer was not following any sort of algorithm, was not consulting code tables or ledgers or throwing dice. I believe people are thinking too hard. The method is probably more simple.

Let us reverse the argument and start from the absolute simplest action: copy the word you just wrote. So after writing "chol" you would write "chol" again. But to write a whole book this way would be too obvious.

Therefore instead of writing "chol" again you modify it slightly and write "shol". This is better but still obviously repetitive.

Second complication: don't always copy the last word — copy a different word already in your field of view. This results in "chol.shol.daiin" Now it looks varied. But "daiin" came from somewhere visible — maybe two lines up. And next you modify "daiin" to "dain". This results in text like "shol.shot.shol.shol.daiin.dain" on You are not allowed to view links. Register or Login to view.. Then you look elsewhere and copy another previously written word.

That's self-citation. Nothing more. Three steps from the absolute simplest possible action:

1. Copy → too repetitive
2. Copy and modify → still too repetitive
3. Copy and modify from varying visible sources → looks like language

Each step adds exactly enough complexity to avoid the previous step's problem. No rules, no tables, no algorithms, no dice. The scribe just needs to not repeat himself too obviously — and the minimum effort to achieve that is self-citation.

Note: "shol.shot.shol.shol.daiin.dain" on line 13 is not even the most striking example on folio f42r.

Look at lines 20-21:
<f42r.P3.20;H> shol.chol.shoky.okol.sho.chol.chol.chal-
<f42r.P3.21;H> shol.chol.chol.shol.ctoiin.sos.odan-

(19-05-2026, 09:01 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.That's self-citation. Nothing more. Three steps from the absolute simplest possible action:

3. Copy and modify from varying visible sources → looks like language

No rules, no tables, no algorithms, no dice.

Except that

You need a seed text (a "visible source") to start the process.
The modify step must respect the complex word structure.
You need dice to choose which word to copy and how to modify it.
The word distribution must be invariant under the modify step.

Without the last precaution, in particular, the word distribution will evolve along the document in ways that we simply don't see.

Your proposed gibberish generation method is anything but simple. It is as complex as the Voynichese "language", with all its statistical and structural peculiarities.

All the best, --stolfi

(19-05-2026, 10:03 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
You need a seed text (a "visible source") to start the process. (1)

The modify step must respect the complex word structure. (2)

You need dice to choose which word to copy and how to modify it. (3)

The word distribution must be invariant under the modify step. (4)

I think we can tie 1 and 2 together and say that if there is some template that all words should conform to (something like curve-line system + or one of your grammars), then 1 could be just generating a set of random seed words according to the template (seems easy, I did something like this in You are not allowed to view links. Register or Login to view. in this thread) and 2 is just making sure modifications still conform to the template.

4 is harder to explain, but maybe we could assume that the template is restrictive enough to guide random modifications in cycles, avoiding the drift.

3 is the hardest for me, I don't understand how to replicate dashstofsk's You are not allowed to view links. Register or Login to view. without some good source of randomness.

(19-05-2026, 07:53 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Also it is highly probable that the sections of the manuscript were written at different times. And because it is also highly probable that the writer was using the alphabet no-where else the gaps of time might have been sufficient for the writer to lose his momentum, lose just enough fluency for him to have to re-adapt to writing anew. The language in each new section then becoming slightly different, giving us the separate language clusters we can see, the most prominent being A and B.

You Sir spurred a brain fart. I have been stupid. I was taking the scribes in the order of the manuscript. I've done some preliminary tests but I'm beginning to think that Scribe 3 was actually the second scribe. Not Scribe 2. I believe this would ease that abrupt transition I keep seeing between Currier A and Currier B.

comparison	Pearson correlation	cosine similarity
Scribe 1 vs Scribe 2	0.571	0.611
Scribe 1 vs Scribe 3	0.735	0.763
Scribe 2 vs Scribe 3	0.943	0.948

(19-05-2026, 10:03 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Your proposed gibberish generation method is anything but simple. It is as complex as the Voynichese "language", with all its statistical and structural peculiarities.

Indeed Voynichese is as complex as Voynichese in the same way as English prose is as complex as English prose. However, the complexity of the output doesn't tell you about the complexity of the mechanism. A snowflake is complex — the mechanism that produces it (water molecules crystallizing) is simple. The VMS has complex statistical properties — the mechanism that produces them (copy a visible word and modify it) is as simple or as complex as human visual attention and human pattern recognition are. The structure emerges from the repetition of copying over 37,000 words, not from complex rules.

Anyway, the question isn't whether self-citation is complex — it's whether it can explain the complexity of the Voynich text and if it's simpler than the alternatives.

(19-05-2026, 10:03 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Except that
You need a seed text (a "visible source") to start the process.

The modify step must respect the complex word structure.

You need dice to choose which word to copy and how to modify it.

The word distribution must be invariant under the modify step.

Without the last precaution, in particular, the word distribution will evolve along the document in ways that we simply don't see.

Your proposed gibberish generation method is anything but simple. It is as complex as the Voynichese "language", with all its statistical and structural peculiarities.

All the best, --stolfi

I agree with most of this, but I think the conclusion is too strong.

Yes, a visible source is needed. That is exactly the point: the manuscript itself repeatedly behaves as if nearby source material matters. Copy-modify from visible sources is not being denied; it is the mechanism being tested. When you first learned your native language, you needed a seed. Every 'language' does. IF the Voynichh is ever solved, there will be a seed involved.

The only real disagreement is over what the modifier has to know. I do not think the scribe needed a full grammar, a cipher table, or a stochastic model of Voynichese. They only needed to avoid producing forms that obviously violate the local word structure. That can be done with a very small set of adjacency habits: which glyphs can follow which glyphs in prefix, middle, and ending positions; i.e. my ledger proposal. Whether that habit was learned mentally or written down as a small crib is secondary.

The “dice” point is also stronger in a computer simulation than in a manuscript-production model. A program needs explicit random choices because it has no human attention, no fatigue, no visual preference, and no habit. A scribe does not need literal dice to choose a nearby word, copy part of it, extend it, or vary it. Human choice supplies the irregularity that a program has to fake with randomization.

The distribution problem is the serious objection. But I do not think the Voynich requires invariant distribution page after page. In fact, the scribal and Currier divisions argue against perfect invariance. What we see is bounded drift: strong local continuity, abrupt regime changes, and then stabilization within a new regime. That is exactly what I would expect if different operators inherited the method but not the same internal weighting of forms.

For example, Scribe 1 and Scribe 3 correlate more closely with each other in internal-bigram profile than Scribe 1 and Scribe 2 do:

Here's a refined and expanded table of the one in the previous posts.

comparison	Pearson	cosine
S1 all vs S3 all	0.692	0.723
S1 all vs S2 all	0.532	0.575
S2 all vs S3 all	0.943	0.948
S1 herbal vs S3 late herbal	0.691	0.723
S1 herbal vs S2 herbal	0.639	0.675
S2 herbal vs S3 late herbal	0.732	0.765
S1 pharma vs S3 pharma	0.645	0.683
S1 herbal vs S3 recipes	0.615	0.651
S2 herbal vs S3 recipes	0.946	0.951

That is not “anything goes” drift. It suggests continuity of method with different weighting. Scribe 2 is much more sharply reweighted toward "ed" bigram type forms, while Scribe 3 appears closer to Scribe 1 in some respects, especially when separated by section.

So I would phrase my position this way: self-citation alone is underspecified. But self-citation constrained by local glyph-adjacency habits and visible-source copying is not nearly as complex as the Voynich language itself. It is a small production method that can produce complex-looking output because the visible source already carries much of the structure forward.

Does this negate the fact that the Voynich may have meaning? I don't think so. It may well be some mnemonic system or even Asian as you suspect. But, if this method I'm demonstrating does work well enough, then it may give some insights into how that meaning was encoded.

(19-05-2026, 11:01 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.That is not “anything goes” drift. It suggests continuity of method with different weighting. Scribe 2 is much more sharply reweighted toward "ed" bigram type forms, while Scribe 3 appears closer to Scribe 1 in some respects, especially when separated by section.

I would suggest that you distinguish for S2 between Herbal B and Quire 13. Both are attributed to Scribe 2 by Davis, but they behave very differently:

Section	Davis's scribe	Word count	<ed> rate
Herbal A	Scribe 1	7,257	0.23%
Pharma A	Scribe 1	2,529	0.67%
Quires 9–12 (Astro / Cosmo)	Scribe 4	2,691	9.55%
Herbal B	Scribe 2	2,695	17.03%
Quire 20 (Stars B)	Scribe 3	10,683	19.40%
Quire 13 (Biological B)	Scribe 2	6,915	27.84%

The progression Herbal B → Quire 20 → Quire 13 is monotonic, with the <ed> rate increasing by a factor of 1.6× from Herbal B to Quire 13. Davis's scribal partition cuts orthogonally across this gradient: Herbal B and Quire 13 are both attributed to Scribe 2 yet differ by 10.8 percentage points in <ed> rate, while Quire 20 is attributed to Scribe 3 yet its <ed> rate (19.4%) sits between the two same-scribe sections.

If you compute the correlation between S2 Herbal B and S2 Quire 13, I expect it will be lower than your S2 vs S3 correlation of 0.948 — despite being the same "scribe." That would mean the within-scribe variation exceeds the between-scribe variation, which is hard to reconcile with a multiple-scribe model but follows naturally from a continuous evolutionary gradient.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19