The Voynich Ninja

Full Version: A One-Page Ledger Method for Generating Voynich-Like Text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
(26-05-2026, 11:57 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Let me make it more specific and yet still manageable to create manually:

Excellent!  Now we're collaborating.  Here's the problem.

Your simple little 3 step idea just exploded into a whole pack of rules. 

It started as just make 20 or so words, some combinations, and random long words.

Now we're at: ranked weights, bigram starts, trigram chaining, final suffix constraints, fallbacks, and a 20-word rolling memory.

And my system would be too cumbersome for a 15th century scribe?  Do you see my point?  You've been opposed to this idea I have of using the ledger and you've been trying to say it's much simpler than that.  And it may well be. But so far, my ledger and "don't look stupid rules" are about a simple as it gets.

Can the Voynich be generated with the method you have?  It may be entirely possible.  Could a 15th century scribe use your method?  Possibly.  But, if so, we'd be arguing the same things you discounted my method for using.
  • local visual resemblance
  • copy/modify behavior
  • constrained production
  • family resemblance

Oh, and you suggested a trigram table?  Welcome to my world.  Let's just call it a ledger.
(27-05-2026, 12:16 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.And my system would be too cumbersome for a 15th century scribe?  Do you see my point?  You've been opposed to this idea I have of using the ledger and you've been trying to say it's much simpler than that.  And it may well be. But so far, my ledger and "don't look stupid rules" are about a simple as it gets.

First of all the result produced by your system appears much farther from Voynichese visually. It's immediately clear that this is not Voynichese (qkamamaiiiin? oltheeo?), while my example maybe even can be used to fool somebody.

Dunsel Wrote:daiin oroksy qoeeey qkamamaiiiin oroksy otedy shey saiin
daiin or qokain daiin rotain dain or cthol
oltheeo chol daiin cthey qokal otedy ol cthol
chedy cheaiiky daiin shey qokain qokair dair cpholsho

vs

oshfdk Wrote:oteedy chedary qoldaly cholky qoky saiiral chedy cpholky
chdykaiin chedary ol daiin chdoroldal otey opdorolky ockhol
ykair otey shedal chedaiin doroldal dorolkchol sheey qokain
orolky opdor daiin qokal oteedor oteedykaiin qoky daiin

Also, I don't think my rule set is very complex, the trigram part can be implemented on the table lookup principles that were used in the XV century and appear a lot in the manuscripts. The selection process is obvious too.

But most importantly, I'm not arguing here which Voynichese-like generation system is better, because I personally believe that Voynichese is a cipher and no generator of any kind was used to produce it. I'm just showing that ledgers and copy+mutate are not the only options, so I'm not sure what exactly they add to possible explanations of the manuscript.

Edit: to clarify my last point, it would be perfect if anyone could prove that copy+mutate was used to create the manuscript. Because proving that copy+mutate+some vague artistic rules can be used to create the manuscript appears a bit pointless, we can just as well say that the manuscript was created on a hunch with no clear process by sheer brainpower of a person with good visual memory and pattern following skills.
(27-05-2026, 12:04 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.I was thinking more along the lines of an extension for “random.” Just for a quick suggestion, I decided to ask the AI for the first time. The basic idea seems viable, but of course the accuracy of the weighting would need to be verified.

Sorry, I was a little slow in posting this. - Edit: Where exactly can I find “Ledger_Scribe1.json”?

Look in my op. There's a link to the github repo.  You can find everything there including the ledger, my generator and tons of other files and data I've compiled.
As for your 'extension'.

You just added weights to glyphs for initial, medial and terminal characters, which is exactly what my ledger does, with an added column.  I have initial, post initial, medial and terminal. That added column I found to be pretty handy when creating that intial bigram for words without coding in bigrams themselves.

You added word shape templates which is what my "don't look stupid" rules do.  Preserve the consonant/vowel shape of words.

I ran your version and it doesn't look horrible.  But you're essentially rediscovering exactly what my paper and this thread has been pushing: positional legality constraints.

Now, to digress about your use of AI.

You are not allowed to view links. Register or Login to view.

Ok, I'm done on that topic.  Here's some output from your code. 

============================================================

PAGE 51
============================================================

otedy ol otedy ol aeolel cnohlnhaieny otedy snoorosn
cnohlnhaieny oaoltiid dair cnohlnhaieny ol chedy chol chedy
dain cthey daiin dair saiin daiin chol daiin
qedaetinokty or chol qokeedy qokedy chol qokair fnaklnkan
chol daiin cnohlnhaieny saiin daiin srsllatkr ckoealkis or
chol aeerhhlian dain chor cthol shol ol qotedy
qokain stkly teleoesieiel shey aeerhhlian or qokal careooter
trhoem qokeedy qokedy chdy shey qokain qokair slkaon
saiin qokair dair okain ol chol qokeedy qokedy
stlohkam ol chedy qokain tnhtkr cthey daiin saiin
daiin ol chedy dain ol chedy aiin qokain
daiin shdhohiion qotedy qokain qhkleiaood ohsehsd chol
(27-05-2026, 12:30 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.First of all the result produced by your system appears much farther from Voynichese visually. It's immediately clear that this is not Voynichese (qkamamaiiiin? oltheeo?), while my example maybe even can be used to fool somebody.

You are correct.  That's not a voynich word.  Does it need to be?  In Takahashi in particular there are numerous long words that look like to words were jammed together.  I was even modelling that at one point.  So, if the generator is still modelling those words (I did remove that code) then let's call it a Takahashi statistical anomaly.   Is my generator perfect?  Hell no.  It still needs work.  The question is, is it plausible?

(27-05-2026, 12:30 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Edit: to clarify my last point, it would be perfect if anyone could prove that copy+mutate was used to create the manuscript. Because proving that copy+mutate+some vague artistic rules can be used to create the manuscript appears a bit pointless, we can just as well say that the manuscript was created on a hunch with no clear process by sheer brainpower of a person with good visual memory and pattern following skills.

You are correct, it's pointless if that isn't how the Voynich was created.  If we knew that answer this forum would be pointless.  Until then, if you're going to keep an open mind about this and use the scientific method, every possibility needs to be explored.  Do you know how many current theories there are to explain dark matter?  It's utterly mind boggling.  Will one of those theories lead someone down the path to the actual solution?  Entirely possible.  I'm just adding paint to a path that already existed. Whatever you think the Voynich is, my purpose is to make you think harder. Did I succeed?

Edit:  One thing for you to consider, just like your generator can import other languages, my ledger works for others as well. From a decyphering standpoint, you may find some value in it.
(27-05-2026, 01:03 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Now, to digress about your use of AI.

I don't really understand your problem—or rather, your rambling statement about AI in general. What tool you use to quickly generate code (I usually write mine myself without any tools) is completely irrelevant as long as you indicate what was used and how. It's also advisable to critically review the code for potential weaknesses (as has been done). Whether the code is any good in the end is something everyone can check for themselves, and if the results from different sources point in the same direction, all the better, right? There’s no need to fear the “ban hammer” here, unless an entire theory is “externally generated” and then just thrown out there.
(27-05-2026, 03:25 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(27-05-2026, 01:03 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Now, to digress about your use of AI.

I don't really understand your problem—or rather, your rambling statement about AI in general. What tool you use to quickly generate code (I usually write mine myself without any tools) is completely irrelevant as long as you indicate what was used and how. It's also advisable to critically review the code for potential weaknesses (as has been done). Whether the code is any good in the end is something everyone can check for themselves, and if the results from different sources point in the same direction, all the better, right? There’s no need to fear the “ban hammer” here, unless an entire theory is “externally generated” and then just thrown out there.

Agreed.  You just seemed a bit concerned when you said you used it the first time.  Trust me when I say you are not the only one.  If I'm generating code, I'll use Codex in VS Code then load it up in Thonny or run it CLI. I can follow python but don't have the experience to write it manually and get the syntax correct. That and Codex can do a UI in Tinkter which would take me weeks. If I'm writing code, Netbeans (php) or Lazarus (pascal).  I also goof off with C3. I will use ChatGPT to, as it puts it, "surgically repair code."  Quite often, Codex will get the code working but screws up UI's, especially in php when HTML and CSS get thrown in. GPT is pretty decent at fixing code.  If you go to the link in my signature, that's a WordPress plugin that Codex created and GPT did the patchups.
(27-05-2026, 01:19 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.You are correct.  That's not a voynich word.  Does it need to be?  In Takahashi in particular there are numerous long words that look like to words were jammed together.  I was even modelling that at one point.  So, if the generator is still modelling those words (I did remove that code) then let's call it a Takahashi statistical anomaly.   Is my generator perfect?  Hell no.  It still needs work.  The question is, is it plausible?

It doesn't have to be a known Voynich word, however certain combinations of characters while physically possible don't appear often or even at all in the manuscript, because it's not even easy to define what they should look like. How would you write oltheeo, where would the crossbar of h attach to? I know of only one instance of what is transcribed as th with no crossbar over t, from You are not allowed to view links. Register or Login to view. (I've added the image below), and if you check it out, it looks like ete to me. Maybe it was cth, but the crossbar got MIA. The combination lth to me looks visually highly strange. Again, there is nothing wrong with any particular case of this, the MS has plenty of weird characters and one off combinations, but on average they are rare, usually you have to scan half a page to find something weird. With the generator you created these happen almost on every line, I've underlined the words that have very rare combinations (the second line is clean, but even there there is rot, which is an atypical combination that only appears ~3 times in the whole MS).

daiin oroksy qoeeey qkamamaiiiin oroksy otedy shey saiin
daiin or qokain daiin rotain dain or cthol
oltheeo chol daiin cthey qokal otedy ol cthol
chedy cheaiiky daiin shey qokain qokair dair cpholsho

[attachment=15787]

(27-05-2026, 01:19 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.You are correct, it's pointless if that isn't how the Voynich was created.  If we knew that answer this forum would be pointless.  Until then, if you're going to keep an open mind about this and use the scientific method, every possibility needs to be explored.  Do you know how many current theories there are to explain dark matter?  It's utterly mind boggling.  Will one of those theories lead someone down the path to the actual solution?  Entirely possible.  I'm just adding paint to a path that already existed. Whatever you think the Voynich is, my purpose is to make you think harder. Did I succeed?

No, and this is exactly the problem. I don't understand how your approach in its present state gives any advantage over the existing theories.

If copy+mutate+ledger by itself produced plausible Voynichese and plausible Voynichese only it would be a very thought provoking result, calling for some explanation and further analysis. However, to produce plausible Voynichese according to the method you present one needs copy+mutate+ledger+ 

Dunsel Wrote:Creating your own book, you would decide what SHOULD be included.  Does this word look right.  Am I using words that kinda look like other words <...>.  Am I creating a chol family with this word. Am I creating a member of the daiin family with this word.

Why then use copy+mutate+ledger at all? This quote above essentially describes artistic approach to creating a plausible looking unknown language writing. Thinking about different word families, thinking whether words look natural. This quote already explains the whole process very well, even after I removed the bit about "copy and mutate". An artist will normally look at the parts of the work already created to make sure the whole thing looks consistent (does this word look right?) and will build on top of what has already been done (am I creating a chol family?), this is just one very natural part of any creative process. There is no need to single it out, put the rest into parentheses and get a theory like "Van Gogh's Starry Night was created by copying and mutating existing shapes (under some personal constraints with some artistic post processing attached)".
(27-05-2026, 10:25 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view....
Why then use copy+mutate+ledger at all?

You keep moving into paleography, and that's not what I'm doing here. I'm not a paleographer and would be a fool to claim such. I'm not trying to reconstruct exact medieval pen mechanics or determine whether a specific glyph join should be read as ete, cth, or something else. My work is computational and structural. Data analysis. That's what I do.

Your definition of plausible and mine are obviously very different. You keep bringing up paleography, which is a visual discipline. I work with measurable structure and production behavior. You see plausible as having the same visual appeal. I see plausible as having the same statistical behavior.

If you or someone else wants to improve the paleographic realism of generated output, that's a separate layer of refinement. But pointing out a rare glyph combination in draft generator output does not invalidate the underlying production model any more than a malformed brush stroke invalidates the existence of painting techniques.

I am asking a simple question. Can a constrained generation process reproduce the same large-scale statistical behavior we see in the Voynich manuscript without assuming there's a hidden language underneath it?

And to answer your question:

You criticize my use of a ledger, but then in the code you produced you created this:

COMMON_WORD_WEIGHTS
COMMON_COMBINATION_WEIGHTS
WORD_START_BIGRAMS
BIGRAM_OPTIONS
TRIGRAM_OPTIONS
NONFINAL_TRIGRAMS
FINAL_BIGRAMS
FINAL_TRIGRAMS
WORD_FINAL_SUFFIXES

That is a continuation table. It is doing a ledger walk at the bigram/trigram level instead of the character-adjacency level that mine does.

And your long-word generation is not just random invention. This line:

word += random.choice(allowed_trigrams)[2]

takes the current ending, finds an allowed trigram continuation, and appends the next character. That is constrained copy/mutation behavior.

So I will ask the same question back: why use copy + bigram/trigram mutation at all?

Because unconstrained random generation collapses. I... and now YOU... have just proved that a constrained production system works better than rolling a dice.
(27-05-2026, 12:29 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.You keep moving into paleography, and that's not what I'm doing here. I'm not a paleographer and would be a fool to claim such.

I think trying to generate text without a theory that accounts for the You are not allowed to view links. Register or Login to view., either implicitly or explicitly, is going to stand out to people who have taken the time to understand the script as missing key details. I appreciate that most of us are laypersons, and the experts talk about the problems of "silo-ing", but the text does not appear to be wholly independent of the paleography. It is a fair criticism to say that your one page ledger doesn't address core features of the text. To be sure, I don't think you have to adopt the CLS wholesale---I have some quibbles with how he treats EVA <l>, for instance---and Cham was not the first to observe the phenomenon, nor was his statement definitive. Likewise, there might be other ways to approach the issues raised by the CLS without relying on it specifically. However, the basic paradigm, that the first half of words have symbols based on EVA <e> and the second on EVA <i>, seems to hold. Your ledger system fails to capture these features and, to my eye, that looks quite far off the text. I don't think it's a much of a defense from these criticisms to say your approach is incomplete as much as it is a recognition that they have a lot of merit.
(27-05-2026, 12:29 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.So I will ask the same question back: why use copy + bigram/trigram mutation at all?

Why not? When formalizing intuitive processes you have to employ some algorithms. I was just mimicking the process that I used mostly effortlessly to write random Voynichese in You are not allowed to view links. Register or Login to view.. I think I was just writing using the usual sequences and character combinations. 

But I think I still failed to make my point clear, let me try again.

It is easy to show that various forms of copy and mutation can build something a bit like the Voynich Manuscript. Or Naibbe cipher can be used to build something a bit like the Voynich Manuscript. Or just You are not allowed to view links. Register or Login to view. can build something a bit like the Voynich Manuscript. The problem is neither of these options will reproduce all the known features of the Voynich manuscript without adding a lot of complications. By showing that the result of method A can produce in some aspects a close resemblance to the result of unknown method B used by the actual author(s) we can't say anything about how close A and B are. I think the very example that your method, Naibbe cipher, my formalized method, my intuitive method all produce something that resembles the manuscript makes this very clear.

Only if the authors of the manuscript used some formal process to generate the text we can recreate a very close copy of some substantial portion of the manuscript and show that method B must have been the same or almost the same as method A. Unless this is the case all rule-based text-generation research has a very certain limit - it can list all possible ways of making something like the Voynich manuscript, it can't in principle prove that any of these methods were actually used.

Do you think this is correct or not?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19