The Voynich Ninja - A One-Page Ledger Method for Generating Voynich-Like Text

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

(27-05-2026, 01:50 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.
(27-05-2026, 12:29 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.You keep moving into paleography, and that's not what I'm doing here. I'm not a paleographer and would be a fool to claim such.

I think trying to generate text without a theory that accounts for the You are not allowed to view links. Register or Login to view., either implicitly or explicitly, is going to stand out to people who have taken the time to understand the script as missing key details. I appreciate that most of us are laypersons, and the experts talk about the problems of "silo-ing", but the text does not appear to be wholly independent of the paleography. It is a fair criticism to say that your one page ledger doesn't address core features of the text. To be sure, I don't think you have to adopt the CLS wholesale---I have some quibbles with how he treats EVA <l>, for instance---and Cham was not the first to observe the phenomenon, nor was his statement definitive. Likewise, there might be other ways to approach the issues raised by the CLS without relying on it specifically. However, the basic paradigm, that the first half of words have symbols based on EVA <e> and the second on EVA <i>, seems to hold. Your ledger system fails to capture these features and, to my eye, that looks quite far off the text. I don't think it's a much of a defense from these criticisms to say your approach is incomplete as much as it is a recognition that they have a lot of merit.

I looked over that CLS and again, that's paleography. Not my bailiwick. But here's what I see in it. He is saying that Voynich glyphs are made of component strokes and are constrained. That makes copying, mutation, word families and word constraints more plausible, not less. CLS could very well fit under a ledger model as a lower level constraint system. Does my generator violate CLS? Yes. Does that mean the constraint system of my ledger is fundamentally wrong? No. It may mean that, if CLS is correct, then it's not taking the lower level constraint system of CLS into account. CLS is zoomed way in at the character creation level. I am zoomed way out at the production level.

Furthermore, he jumps to some really dubious conclusions: "Since the Voynich Manuscript’s text does not seem to fit a natural language in these tests, nor is it random, then it must be artificial, in which case there is no reason for CLS not to fit." That's a logical fallacy called a false dichotomy. Simply because something doesn't fit description A, then it must be description B. He's concluding that there is no option C or D or any other. He doesn't test against a shorthand or a mnemonic structure or various cryptographic structures like Naibbe. Furthermore, I saw no stroke by stroke comparison to an actual period manuscript. Claiming the Voynich scribe had stroke habits without disproving other manuscripts of having stroke habits is a huge gaping hole.

So, if my ledger doesn't conform to CLS, well, that's because CLS is not an established fact in my opinion.

Is his theory right or wrong? I don't know. There's not enough facts there for me to make a decision. Do I feel obligated to include it in order to prove my point? Nope, not yet anyway.

Edit: The best way to describe this is, I am artistically impaired. Data mining? I got that. Show me the numbers.

(27-05-2026, 03:10 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Does my generator violate CLS? Yes. Does that mean the constraint system of my ledger is fundamentally wrong? No. It may mean that, if CLS is correct, then it's not taking the lower level constraint system of CLS into account.

Crucially, however, your ledger, the one we are supposedly talking about, does not address these kinds of "lower-level" constraints. And that quite simply raises questions about the applicability of these findings to the VMS.

I also think you're selling the CLS short. These aren't "stroke habits", but a fundamental observation about how letterform and letter order correlate. With some exceptions---and a good deal of the paper is spent defining those exceptions---letters with a base of e precede letters with a base of i. This kind of ordering of letters is utterly atypical. Cham arguably could have done a better job linking this to the bigram entropy findings, which have amply shown that period manuscripts did not order letters like this, but there are few writings systems where words from one half of the alphabet show up in the former part of the word and the remainder in the latter, and certainly not in European corpora, and I think we can extend his write-up some charity on that score. Even if you think it's just a "stroke habit", it's fair to say a good ledger should account for it, and a fair criticism to note it doesn't.

His conclusion doesn't much matter to the point here, which is that your ledger does not respect the letter-ordering phenomenon. A ledger that does not substantially reproduce letter order is failing to capture one of the more striking parts of Voynichese text

(27-05-2026, 03:54 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.Crucially, however, your ledger, the one we are supposedly talking about, does not address these kinds of "lower-level" constraints. And that quite simply raises questions about the applicability of these findings to the VMS.

I also think you're selling the CLS short. These aren't "stroke habits", but a fundamental observation about how letterform and letter order correlate. With some exceptions---and a good deal of the paper is spent defining those exceptions---letters with a base of e precede letters with a base of i. This kind of ordering of letters is utterly atypical. Cham arguably could have done a better job linking this to the bigram entropy findings, which have amply shown that period manuscripts did not order letters like this, but there are few writings systems where words from one half of the alphabet show up in the former part of the word and the remainder in the latter, and certainly not in European corpora, and I think we can extend his write-up some charity on that score. Even if you think it's just a "stroke habit", it's fair to say a good ledger should account for it, and a fair criticism to note it doesn't.

His conclusion doesn't much matter to the point here, which is that your ledger does not respect the letter-ordering phenomenon. A ledger that does not substantially reproduce letter order is failing to capture one of the more striking parts of Voynichese text

In order to map the basic statistics of the Voynich, those lower level constraints are mostly irrelevant. If I want to produce a word length distribution, I don't need to know which hand the scribe held the quill in. IF I were telling you that I can exactly reproduce the Voynich, then I better have some pretty amazing details that may even go down to the stroke level. I am not saying that at all.

I don't think I'm selling anything short. There are issues with his interpretation and in the world of science, charity is not something to be handed out. Yes, he did a lot of good work on the Voynich and he's providing tables that are images, but he's not providing the data. None of the code he used or created is available for me to independently come to the same conclusions. I can't run any of those tests. I'd have to guess. His solution for describing his math is to point me to a Wikipedia page. Now, I am not saying he's wrong. What I'm saying is, that to convince me that it's something I need to consider for my ledger, he needs to provide much better proof.

And I have done a lot of work on the Voynich. But I'm not using logical fallacies to try to convince you of what I'm suggesting. Instead, I'm giving everyone access to the exact data and code I used to produce the results. Download it, test it, you don't like the results, there's tons of knobs to turn and see if you can get better results. Still not happy? Shove my code into to Codex and create your own generator. You think CLS is valid? Fine, shove some code into my generator to emulate it and see what happens. If you manage to produce perfect Voynich, great! I'll be happy with that result. But until he produces some code that explains how those words are put together, I'm not going to try to guess at his methods.

And my ledger does honor LEGAL letter ordering based on what's in the Voynich. No, it is not perfect. I am not trying to produce a "striking" result. If I had a striking results, I wouldn't be on here, I'd be talking to a publisher. What I'm trying to produce is a plausible STATISTICAL result.

In my paper and in this forum I believe I have further developed the work of Timm & Schinner and provided explanations for the following:

Dense local edit-distance connectivity

Why so many Voynich words differ by only one mutation (ED1).
Why families cluster locally.

Copy/mutate production behavior

Words being derived from nearby prior words rather than independently invented.
Local propagation of forms through insertion/deletion/substitution.

Restricted glyph adjacency

Why some glyph combinations are common and others effectively impossible.
Legal transition structure within tokens.

Word family ecology

Why forms like daiin/daiir/dair/etc. behave as mutation neighborhoods.
Persistence and expansion of lexical cores.

Positional behavior

Different behaviors at line start, paragraph start, internal positions, etc.
Gallows concentration patterns.

Gallows distribution

Why gallows cluster at paragraph and line starts.
Why they behave differently from ordinary glyphs.

Sheet-level locality

Why source relationships collapse strongly at sheet/quire level rather than purely page-to-page.

Two-sheet / three-sheet source packet behavior

Why many Scribe 1 pages reduce to a very small dominant source pool.

Local lexical continuity

Why neighboring folios share mutation neighborhoods and recurring cores.

Currier A vs Currier B (Scribe 1 vs Scribe 2+) regime separation

Different lexical environments and mutation ecologies between early and later manuscript regions.

Currier-like regime drift

Statistical shifts across manuscript regions without requiring a language change.

Word length distributions

Approximate Voynich-like token length behavior.

Zipf-like statistical structure

Non-random frequency falloff emerging from recursive reuse and mutation.

Vocabulalry development

How a copy/mutate system alone can generate a vocabulary size comparable to the Voynich.

Hapax generation

How a copy/mutate system can generate hapax token counts comparable to the Voynich.

High repetition without exact monotony

Why the text is repetitive but not trivially repetitive.

Human-feasible manuscript production

A practical workflow a medieval scribe could actually execute repeatedly with limited tools and memory requirements.

Mutation residue

Why some forms appear isolated or weakly connected after many mutation generations.

Cross-page persistence of lexical cores

Why certain high-frequency forms remain stable over long spans.

Emergence of pseudo-language structure

How language-like statistics can emerge without underlying semantic language encoding.

Why the manuscript resists simple random models

The text is structured, but the structure may arise from constrained generation rather than natural language.

Why Voynichese can look internally coherent

Recursive reuse naturally creates apparent grammatical consistency.

Why new forms rarely become completely illegal

Mutation constrained by adjacency legality prevents explosive randomness.

Why the manuscript feels “self-referential”

Because production continually feeds on prior output.

How a small seed can bootstrap a large corpus

Recursive expansion from limited initial material.

Why generated text can resemble Voynich statistically without semantic decoding

Statistical resemblance does not require translation or plaintext recovery.

However because my paper, my posts and my generator do not even TRY to explain:

Exact CLS ordering behavior
Stroke-level paleography
Scribal motor habits
Full glyph-class asymmetry
Exact entropy profile reproduction
Semantic meaning
Encoding/decoding of plaintext
Perfect Voynich reproduction
Every positional phenomenon
Every rare glyph behavior
Exact Currier separation
Illustration-text relationships

...some responses have declared the work a failure.

I did not set out to produce a perfect reconstruction of the Voynich Manuscript. I set out to investigate whether a constrained copy/mutate system with limited working rules could reproduce a reasonable proportion of the manuscript's statistical and structural behavior in a very human doable format. At this point I believe the answer to that question is yes, since much of the criticism now being directed at this work concerns aspects of the Voynich which the current model was never intended to reproduce.

It seems that no matter how many times I state that this is a statistical model, I keep getting dragged into debates about why its visual appeal is lacking.

Therefore, unless future criticism addresses the actual statistical and generative scope of this work - namely the copy/mutate + ledger model and constrained copy/mutate generation in general - rather than demanding full visual, paleographic, or semantic reconstruction, I am unlikely to spend much more time responding to those objections.

While I do greatly appreciate feedback, I am primarily interested in criticism directed at the actual claims and scope of the model, rather than features that are well beyond that scope. I also welcome suggestions on how the system could be improved in the future to better reproduce the manuscript visually, but I will not consider the current model a failure simply because it does not yet do so.

I played around with your “Ledger Generator” for a bit. Specifically, I parsed my 485 tables on You are not allowed to view links. Register or Login to view. into JSON and converted them to your format. The whole thing is (obviously) syllable-based. Does the result look “Voynich-like” enough, or not?

============================================================
PAGE 48
============================================================

otaiin sheckeey otey chinal okeey okachey ytol cholshkaiin
chinal olkydy okeey oteydy chearain qoty olkydy cholshkaiin
qokear qokal chcphedy aiildy aiildy cholshkaiin oteolkeeody chdy
ykaiin sheydy sheeodain shoiin shody qofchdaiin qokedy ytal
qotey shofshedy ytedy cheodchy daiin olol olkydy chinal
chedyar ykchy okchdy sheydy choraiin cheekey olkeshey qotchkeey
chcpheor olol okcheody qokainal otedeey sheorol otey qoorchy
qokainal ololal ytain shckhchy choal qoeol chosaroshol choal
oteochedy olor chinal okeoldy daky sheydy okeolar cheoikhy
qopchedal sheeydy cheeytal chty qokedyol olfsheoral cheoetey shepchedy
chokody shechy shedy shoheaiin okeodal aiir choky sheaiin
sholtchey olkeal otey chcfhy qotir otedy qockhal

============================================================
PAGE 49
============================================================

shiinol sheody shekody okiin okeeodar qokeey chcphhdy chky
shockhhy shiinol ytaiin oteol oteedchey okar qokchol cheey
otolal qokeo cheey ykeedy qoeol otcheedaiin sheolkchy cheeydy
otolor aiirody qokeedyol chpchy okaiinol okalaiin daiir okchy
qokadyol sheeoky aiirody qoteeedy oleeolar qokainal qoty otal
cheol oteeschey okeeol oteedain otoaiin okar choor shoor
oteedy oteeo chodaal charal qokeol ykedy shoky cheey
chedy qoteol qool shiinol cheor sheckeey sheain dalalody
sheckhdy oleedar okedy chotoey oldyol olcphy otshaiin olol
otechdy cheoor daiinal qopdaiin qoty chpsheedy cheor okedy
chedy otedy otolopaiin okachey ykolpshy qoksheedy choty qokeeydy
sholkeedy oteeydy okeal chedyol chocfhy ytol chokchey

(27-05-2026, 06:24 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.In my paper and in this forum I believe I have further developed the work of Timm & Schinner and provided explanations for the following:

I think you may have missed my last reply because it was at the bottom of a page, but I'd also like to say here a bit about why I personally don't find this research direction interesting (I'm talking about my perspective only) in the context of Timm & Schinner.

There are two different aspects of Timm & Schinner work for me:

1) The finding that some of the features of the manuscript can be explained by the process of copying and mutating past word forms.

2) Attempts to build a text-generator that produces Voynich-like text via copying and mutating.

I find the first one very interesting, especially when considering the manuscript as a ciphertext.

Copy and mutate can be an essential optimization when using certain cipher types for longer texts. For example, consider a homophonic substitution with nulls. If there is a need to repeat a plaintext phrase or word combination used previously, it can be both easier and more secure to just copy it over from already prepared ciphertext while changing a few homophonic assignments and rearranging/replacing the nulls. This way you don't have to focus on remembering all character assignments and at the same time you make sure that the ciphertext in two locations differs significantly enough, to avoid creating two identical pieces of ciphertext that could compromise the cipher. If your draft is the ciphertext written directly under the plaintext, finding already encoded words and word combinations and copying them over with homophonic/null adjustments is arguably the fastest way to encode large texts. A substantial portion of text can be enciphered this way. If I wanted to encipher this post, each time I had to write "ciphertext" I would just look up the previous place where I wrote "ciphertext", then I would change a couple of letters to other homophones, add/discard a null or two and write it down. Because of this possible scenario, I have a lot of interest for the autocitation research that focuses on the individual potentially verifiable examples of the autocitation.

However, I consider the second part, complete text generation by copy and mutate as the primary method, to be a dead end, for the reasons I explained in my previous posts: even if the manuscript is meaningless and not a cipher and was indeed generated using copy and mutate, it's overwhelmingly likely there will never be a conclusive way of proving this or finding the correct set of rules.

(28-05-2026, 01:13 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.I played around with your “Ledger Generator” for a bit. Specifically, I parsed my 485 tables on You are not allowed to view links. Register or Login to view. into JSON and converted them to your format. The whole thing is (obviously) syllable-based. Does the result look “Voynich-like” enough, or not?

I have stated that I am artistically impaired. Does it look Voynich like to me? Yes, but I have learned that my eyes can easily fool me when it comes to identifying Voynich. Does it statistically match the Voynich? I can almost instantly see the lack of short and 1 character words and some long words. My first guess, the length distribution would be off which will likely throw any zipf curve off. And I can easily see you're using Scribe 2 and Scribe 1. You have both <ed> and <ho> on those pages. Look at my next reply to oshfdk. You'll see I tried the same syllable approach without even knowing about your work and the immediate issues it had.

One thing I will suggest if you plan to keep digging is, limit your work initially to scribe 1 or Currier A pages only. Scribe 2+ uses the same underlying system but the combinations are different enough to throw off any data that examines the whole Voynich or specific sections. If you haven't already, look at the links to posts I made here about <ed>. Links to those posts are in the op.

(28-05-2026, 01:56 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think you may have missed my last reply because it was at the bottom of a page, but I'd also like to say here a bit about why I personally don't find this research direction interesting (I'm talking about my perspective only) in the context of Timm & Schinner.

I very well may have missed that post and my apologies for doing so.

There is nothing in my work that says it can't be a cypher. I have mentioned in another reply that a whole production of gibberish is very unsatisfying. I still have this underlying hope/belief that it does contain actual content. My working theory is that it's mnemonic which you could say is a form of cypher. The point of the generator is to explore the copy/mutate possibility. To locate the sources for these pages, the parent words, and offer a possibility. If the sheet/quire source method I think I've discovered leads to a decipherment, I'm all in and I truly hope it helps you in that quest. Let me know what I can do to further that. And you are correct, it may very well be a dead end. But, in my opinion, if you don't know the neighborhood the best way to find out if that street is a cul-du-sac is to actually drive down that street and look. And so far, I am aware of methods that can produce Voynich looking text that fails statistically. I'm trying to approach from the other end. To succeed statistifcally and THEN match visually.

Also, you piqued my curiosity with your bigram smashing generator idea. I actually pursued that yesterday. I had codex create a small machine learning script that would run through my data and, instead of bigrams/trigrams, it tried to break words into syllables and create a set of weighted tables. The thought there was, a scribe would understand pronunciation and syllables rather than linguistic terms like bigram, n-gram, etc. And I've always had this idea that the Voynich can be pronounced. I then rearranged your generator idea using those ML results. Here's what I got.

============================================================

PAGE 97
============================================================

chckhol qotchody okchees chal chos shody ykchckhey otshes
chody okchees oteoky chckheey ypshor oteodaiin ypshor choty
chees shor chckhol chochor chopchor shey choschochor chees
chopchol qokeeor chal chody qokeor qotchaiin qokchar cheees
cheeaiin chees shey chopydaiin chody otar okor ypshody
yshody chopchos ykchcthey shokydaiin ykchey cheeey chos chees
chckhor choky qokchol ykchey chos shody yshor chochor
otol qokchaiin qokchor shody chcthod qotchaiin qokeeoky cheeaiin
qokchol chos chochor qocthod chopchol shey qokcheey ykeees
qokchor chochor qotody okchy chodalchy shey otaiin chody
chckhey okchody chckheol chody chal chckhey shoteodaiin qotchaiin
shotar chcthol yfod chcthol chal yckhey qotchody

============================================================
PAGE 98
============================================================

ykchey yfod qotchaiin chcthaiin chckhey shoteeaiin otol qotchaiin
qokeey chcthody chckhey shes chody choty chopydaiin chckhor
shodan otchodalchy chcthor chees chocthy shey chopchey qokeey
chcthol shey ykchopchey okshor chokchol otchodalchy chopor okchal
chockhy chodalchy ykchopchey otchockhy shod chcthal chocthy qokeeodaiin
qokody chopydaiin cheees chckhey chos ykam qotchody shey
qochckhol okshor chockhody shotaiin ykeees chckhoal cheeaiin otchoty
qokodalchy chody otchoty shod cheees okeeaiin shor qokchaiin
chees qockhol chal chody ykam chees chcthaiin ykchal
qokeor qotchody qochckhol qotchaiin choty qotody chal qocthey
cheeey qokcheey qocthey chckheey yckhey qokchor chckhody qokchaiin
shor chees qokchor chckhey chopydaiin qokeaiin shod

That uses no seed page, just "syllables". It does use a type of ledger with a lot of weighting. So you can see, it does LOOK a lot more like Voynich than my generator. However, the statistics were WAY off. In particular, the vocabulary size and hapax count. I could likely have continued working on it and gotten those numbers closer to Voynich, but, it would have required some pretty serious memory juggling to produce it unless... they had a method for pronouncing Voynich words which we will likely never prove.

Code:
============================================================

GLOBAL STATISTICS

============================================================

Tokens                : 9500

Types                  : 379

TTR                    : 0.0399

Hapax                  : 59

Chunk uses            : 24337

Chunk types used      : 271

Word length distribution

4  1830

5  1239

6  1622

7  2159

8  1520

9  791

10  211

11  127

12  1

Top 30 words

chody          383

shey            358

chckhey        310

cheees          298

chos            293

shor            271

chees          211

qokchor        209

chal            168

chckhol        165

qokeey          160

shody          156

choty          147

chckhody        144

shod            124

otol            118

cheeaiin        109

qotchaiin      108

chcthod        104

chodalchy      103

shes            102

chcthey        100

chockhy        99

cheeey          99

qokchaiin      90

qokchol        89

okor            85

qokcheey        84

qokeeor        80

chopchol        78

Top 30 chunks used

ch        3333

sh        1735

qo        1705

ey        1406

ckh        1056

ody        926

or        920

ok        793

ee        779

ot        726

cho        722

cth        687

ol        604

kch        591

aiin      570

es        562

y          448

ke        414

od        376

al        360

yk        349

os        315

chy        274

tch        259

ees        249

op        249

eey        203

eo        191

ar        186

ty        185

Top 30 character bigrams

ch    6311

ho    4805

he    2686

ee    2417

ok    2375

qo    1967

sh    1960

od    1924

ey    1923

hc    1590

ot    1453

or    1345

kc    1329

ck    1324

kh    1324

dy    1142

ai    951

es    928

ol    875

in    866

ct    851

th    851

ha    840

ii    835

ke    759

oc    673

al    603

eo    529

hy    525

da    460

Top 30 character trigrams

cho  2659

hod  1602

chc  1590

kch  1329

ckh  1324

hey  1293

qok  1231

che  1214

ody  1142

sho  1113

hee  1095

okc  1061

hor  995

hck  926

cth  851

aii  834

iin  759

ees  753

she  745

oke  719

hct  664

eee  661

kho  626

hol  579

khe  568

kee  566

eey  547

cha  522

qot  428

otc  419

>>>

(28-05-2026, 03:21 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I have stated that I am artistically impaired. Does it look Voynich like to me? Yes. Does it statistically match the Voynich?

I'm not sure exactly which statistic you're thinking of, but you can check it yourself.

(28-05-2026, 05:09 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(28-05-2026, 03:21 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I have stated that I am artistically impaired. Does it look Voynich like to me? Yes. Does it statistically match the Voynich?

I'm not sure exactly which statistic you're thinking of, but you can check it yourself.

Well, it would take a longer run and a comparison to the Voynich. Already, word length distribution is going to be off if the rest of the pages look like those. The Voynich has a vocabulary of known unique words. Any generator is going to have to create a similar vocabulary. Hapax tokens. The Voynich has a LOT of words that only occur once anywhere. You'd have to match that rough count compared to the Voynich. And, here's the big one. If you create 10,000 pages of Voynich text, does it collapse into Markov chain nonsense? That alone isn't a failure of the generator but, if it remains reasonably stable even generating that many pages, then it becomes a production system that doesn't collapse. Those are the basic numbers. Then you have other things to consider like gallows usage. Some words start with a gallows, some have internal gallows. Words with initial gallows tend to start pages and "paragraphs." Are you matching those numbers closely? Voynich has just tons of possible statistics. A generator needs to match as many as possible and hopefully... it does it through emergent behavior and not by being forced. Bigram/Trigram counts. Do yours match the Voynich "mostly"?

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19