The Voynich Ninja

Full Version: A One-Page Ledger Method for Generating Voynich-Like Text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
(26-05-2026, 09:08 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.My main point was that there are a great majority of possible changes that are (apparently) forbidden.
This existence of a very large set of relatively strict rules strongly suggests, that there is still something else going on. It is not just a matter of creating meaningless words based on small changes to previous words.

The rules are non-trivial too. A relatively simple (potential) change from e to a is allowed in some contexts but not in others.
One can introduce a f intruding in a ch, before a ch , but not before one or two e 's.

Etc, etc.

The context-dependent constraints you describe — "e to a allowed in some contexts but not others," — are the glyph-design rules Schwerdtfeger documented in 2008 (see Timm & Schinner 2020, p. 10). They follow from the stroke-level structure of the writing system. They're not mysterious — they're the visual grammar of how strokes combine.

But more importantly: you can't argue that a modification is "forbidden" based on its absence from an incomplete text. Folios are missing from multiple quires. The most common word in Currier B — "chedy" — would look like an exception if we only had Currier A pages. Every "forbidden" combination you identify might exist on a missing folio or simply reflect a path the scribe didn't take on the folios that survive.

I prefer to analyze the text the scribe actually wrote rather than speculate about the texts he didn't write — since that number is endless.
 

But more importantly: you can't argue that a modification is "forbidden" based on its absence from an incomplete text. Folios are missing from multiple quires. The most common word in Currier B — "chedy" — would look like a forbidden form if we only had Currier A pages. Every "forbidden" combination you identify might exist on a missing folio or simply reflect a path the scribe didn't take on the folios that survive.


 


I prefer to analyze the text the scribe actually wrote rather than speculate about the unlimited number of texts he didn't write.
(26-05-2026, 02:43 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(26-05-2026, 02:02 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view. A small legality system plus local copy-and-mutate behavior operating over a visible working set.

Either I misunderstand the ledger, or it produces a huge number of unattested words, ones that would be expected if using the proposed copy+mutate method. For example, dain is a common word, so its simple mutations should be common, correct?

The following are simple mutations of dain and seem to pass the ledger test, while never appear in the whole manuscript, as far as I know:

dein (doesn't appear in the MS)
daon (doesn't appear in the MS)
diin (absent as a word, otaldiin appears once)
gain (doesn't appear in the MS)
main (doesn't appear in the MS)
daiy (doesn't appear in the MS)

and I think I can generate 10+ more.

Edit: I removed two examples from the list - dais and daid, I probably skipped over them when testing, they do appear at least once each. In any case, dain appears more than 400 times in the MS. 'y' is a very common word ending character. daiy is valid according to the ledger, I would expect dozens if not a hundred of daiy's in the MS, why there aren't any?

No, you understand the ledger just fine.  If you were creating your own Voynich, all of those are legal words and could create your seeds.  But I'm going to bastardize a Jeff Goldblum quote, "Just because you could, doesn't mean you should." 

If you wanted to create a book of gibberish but have it look like a real language, you wouldn't create every form possible.  You looked at those words and realized the generator COULD produce them.  But the real Voynich doesn't because the scribe had to decide if they SHOULD create them and chose not to.  Creating your own book, you would decide what SHOULD be included.  Does this word look right.  Am I using words that kinda look like other words I've already copied and mutated.  Am I creating a chol family with this word. Am I creating a member of the daiin family with this word. 

The key word to all of this is constrained.

There are many works of art I look at and think that's just garbage.  Splatter on a canvas.  Others see form, pattern.  When I look at a Rembrandt, I see huge amounts of constraint in the choice of colors, what brush strokes.  If this method I'm proposing is correct, you have to look at it like it's art. Why didn't the scribe choose that word? What constraint kept them from making dein? Was that choice intentional?

Once you look at what they did create instead of what they COULD have created, you begin to realize that that constraint is why it still looks like a language but isn't.  And, if I'm right, that constraint has been VERY effective. Wouldn't you say?
(26-05-2026, 04:12 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.No, there are far more words than just these four. They exist in a multidimensional network of dozens of related forms. The scribe doesn't choose between "otedy" and "oteedy" in isolation — he chooses among the entire visible pool of similar words:

Sorry, I don't see how that answers the objection below.  If anything, it makes it stronger:

(21-05-2026, 03:29 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.If, after a suitable warm-up period, the words otedy and oteedy are equally frequent (as shown), and the mutation process can create ytedy from otedy, it should also create yteedy from oteedy. Then ytedy and yteedy should be equally frequent too.  But their ratio is only 1:6.

As I see it, the only ways your model would create the above counts are (1) the mutation of the prefix o->y is sensitive to whether the suffix is edy or eedy, or vice-versa, or (2) the seed text had those four words in those approximate skewed ratios (maybe no ytedy at all), and the mutation rules cannot create enough ytedy from otedy or from yteedy to raise the ytedy:yteedy ratio above 1:6.  Isn't that so?

Quote:Here is a larger sample for the ok-/k-/t-/ot- prefix group alone (not even including the y- or qo- variants):

Granted, compared to the patterns seen in typical texts in European languages, and even in Mandarin with tones, that table is more "Cartesian" ("closer to rank 1", with independent choices and/or mutations for prefix and suffix).  But it still has significant deviations from that model, which says that all the rows should be multiples of each other.  

For example,

  okain (144)  okar (129)
  otain (96)   otar (141)

Quote:There is no 1:1 relationship between two similar Voynich words. Each word exists in a network of variants.

Sorry again, I don't know what this means.

Quote:But within each prefix group, the ratios aren't uniform either — "okeey" (177) is more frequent than "okey" (63), "okaiin" (212) is more frequent than "okain" (144), reflecting which forms the scribe happened to use as sources more often.

Are you saying that the choice of the new starting point, whenever the "source pointer" for copying is reset, is also sensitive to the word at that point?  That is, the Scribe was more likely to restart the copying at an "okeey" than at an"okey"?  

Quote:Some cells are empty — "kaiir" (—), "tail" (—), "tam" (—). Not because a rule forbids them but because the scribe never happened to produce them.

That is the same as saying that the reason "tam" never occurred is that "tam" never occurred.

Take that "tam" as another example

  tam (0)   otam (47)
  tol (48)  otol (86)

Why didn't the mutation step ever produce a single "tam" from "otam" (or from any other nearby word, like tar (43) or kam (9))?

In fact, how come the frequency distribution of prefixes (or of suffixes) is so uneven, and so stable?  The mutation steps should gradually smooth out the distribution to some eigenvector of the transition matrix A (where A[i,j]) is the probability that prefix i mutates to prefix j)  That matrix must be highly skewed to produce such an uneven distribution...

Quote:That is what I mean by "frequent words are more likely to be selected as copying templates, generating more variants." Not that each word generates its variants at equal rates — but that the entire network of similar words feeds back on itself, with frequent forms generating more variants and rare forms generating fewer.

But that is the point. Once "otam" and "tol" and "tar" have become frequent, what is stopping them from mutating into "tam"?

An you still haven't answered the oldest objection.  The proposed algorithm for generating Voynichese-like gibberish text starts with
  Step 1. Generate a Voynichese-like gibberish seed text.
  Step 2. ... 
See the problem there?

All the best, --stolfi
(26-05-2026, 03:22 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.There are many works of art I look at and think that's just garbage.  Splatter on a canvas.  Others see form, pattern.  When I look at a Rembrandt, I see huge amounts of constraint in the choice of colors, what brush strokes.  If this method I'm proposing is correct, you have to look at it like it's art. Why didn't the scribe choose that word? What constraint kept them from making dein? Was that choice intentional?

Once you look at what they did create instead of what they COULD have created, you begin to realize that that constraint is why it still looks like a language but isn't.  And, if I'm right, that constraint has been VERY effective. Wouldn't you say?

Yes, treating it as an art form works nicely, but then there is no reason to invoke copy+mutate at all. This is just automatic writing based on visual patterns, like a special type of doodling.

Either there is a set of a few reasonably simple rules that anyone could use to reliably produce Voynich manuscript-like text, or there is little explanatory benefit in listing these rules. I mean, if they don't explain the process well enough, then no-one can prove this was actually the method used to create the manuscript. A possibility, yes, maybe. A necessity, not at all.

I created a piece of plausible looking Voynichese earlier in this thread without using copy+mutate, but just using my intuition of what Voynichese looks like. Copy+mutate should at least do better to have some advantage over "intuitive doodling/art-like writing" explanation. I think so far all proposed reasonably simple copy+mutate rule sets if followed exactly would produce a text that many people on this forum would immediately recognize as implausible Voynichese.
(26-05-2026, 03:45 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view....

You are arguing in circles. Each response restates the same objection in a different form: "why doesn't mutation X produce word Y?" And each time, the answer is the same: because the scribe is a human making contingent choices, not an algorithm with uniform transition probabilities.
(26-05-2026, 03:07 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.I prefer to analyze the text the scribe actually wrote rather than speculate about the texts he didn't write — since that number is endless.

I will politely disagree with this statement. A model does not just have to explain what the scribe wrote. It also has to explain why the text did not fall apart into garbage.  A lot of objections in these discussions are really variations of “why don’t we see this?” or “why didn’t the scribe do that?” They are questions about constraint.

In my own experiments, most copy/mutate systems fail pretty quickly unless the constraints are tight enough. They drift, bloat, repeat badly, or start producing obvious nonsense. And, given enough generations, even my generator has issues. That failure is informative. It tells us the Voynich text is not just “anything goes” mutation. So I think the unwritten text matters too. If a proposed method could easily generate endless bad Voynichese, then we need to explain why the manuscript consistently avoided doing exactly that.

If we don't explore the possibilities of what wasn't created then we'll never have acceptance for what is created.
(26-05-2026, 03:45 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But that is the point. Once "otam" and "tol" and "tar" have become frequent, what is stopping them from mutating into "tam"?

An you still haven't answered the oldest objection.  The proposed algorithm for generating Voynichese-like gibberish text starts with
  Step 1. Generate a Voynichese-like gibberish seed text.
  Step 2. ... 
See the problem there?

All the best, --stolfi

That’s exactly the point. Nothing stops otam, tol, or tar from becoming tam unless the system has constraints that block or disfavor that move. That is why “copy and mutate” alone is not enough.

My argument is not that any mutation process will work. Most of them do not. They collapse into bad forms very quickly. The useful question is: what constraints are necessary to keep the text inside the observed Voynich range?

On the seed objection: yes, if the method requires a fully Voynich-like seed text, that is circular. But my claim is narrower. The seed does not have to be a finished explanation of the whole manuscript. It only has to be a starting sample plus a legality system. After that, the question becomes whether the process can expand from that starting point without collapsing.  So I agree with the objection if the claim is “Voynichese comes from already-existing Voynichese.” That proves nothing. But if a small seed plus constraints can generate stable output while weak or unconstrained versions fail, then we have learned something about the production mechanism.

And, if you look at f1r, it seeds itself.
(26-05-2026, 04:06 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Each response restates the same objection in a different form: "why doesn't mutation X produce word Y?" And each time, the answer is the same: because the scribe is a human making contingent choices, not an algorithm with uniform transition probabilities.

So you are saying that the scribe is not doing copy-and-mutate, nor choosing the prefix and suffix independently of each other, but is choosing each word with a frequency that depends idiosyncratically on the whole word?  So that it may often use "otam", "otol", and "tol", but never "tam"?

That sounds pretty much like the process of writing a meaningful text...

And what about the "seed text" objection?

All the best, --stolfi
oshfdk ' Wrote: You are not allowed to view links. Register or Login to view. I think so far all proposed reasonably simple copy+mutate rule sets if followed exactly would produce a text that many people on this forum would immediately recognize as implausible Voynichese.

I think the Vounich is implausible. If it is copy mutate, the best you can hope for is statistically comparable
 Everyone hopes for an exact explanation or reproduction. What if there isn't one?  

Try my challenge in a previous reply. See if you can make statistically correct Voynich with it.
(26-05-2026, 04:37 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.That sounds pretty much like the process of writing a meaningful text...

Amazing isn't it? That you could take a process for creating meaningful text and instead, create gibberish that has fooled people for 600 years!
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19