(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Why would a realistic copy-mutate system stay conservative? There is nothing that dictates this.
The number of rules observed over tens of thousands of words are quite complex, and they are indeed rules.
Code:
on of beein fascinatng tht etz
come s been hep on pto been
dof tanswer pitc talfing hnooked onv talking
at beein he handsom sound
oin itc we otf yit
matpers opf iot oin he wel
pof he lnoo slep talking hye pipwth hyey talfing
onu heo s njoo on
fyor talfing of soce he
on s tof hem caymet tot of
nto no sall yof her sernness
he talking her he i golad dicussion
direaction dof ato opflc ree
That is output from my generator using 150 words from Dracula as the seed and creating an English ledger. Note: I did not ever intend to create a pseudo English generator so it's FAR from perfect, as in WAY heavy on length 3 words. But, the method is the same. The generator is NOT exploring all possible permutations equally. It is staying trapped inside a conservative local ecology built from nearby forms. That is why:
- talking → talfing
- discussion → dicussion
- fascinating → fascinatng
- handsome → handsom
appear repeatedly while thousands of other theoretical edits never appear.
The system is trying to maintain textual plausibility and family continuity, not maximize combinatorial exploration. The Voynich appears to behave similarly. The existence of hundreds of theoretical edit possibilities does not mean a human production process would actually use them.
(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.We see that the word variations are very strict. Therefore, if the text was generated by modifying previous words, it would have to have followed strict rules. That is the correct direction of the logic.
There is no reason to assume that there would be very strict rules (which are then broken somewhat gradually).
I disagree that strict behavior necessarily implies a huge explicit rule system. Strong local reinforcement alone can produce highly constrained output over long runs.
The important distinction is between:
- all theoretically possible mutations,
and
- the tiny subset that preserve plausibility.
Most mutations simply stop “looking right” relative to nearby forms, so they are naturally filtered out without the scribe consciously following hundreds of formal rules.
(22-05-2026, 12:03 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.EDIT:
Let's do some rough counts.
The word chedy could be considered to have four characters.
Limiting to edit distance 1:
Each of these could be changed into another, leading to 4 times, say, 20 options.
Each of these could be deleted, leading to 4 more.
A new character could be added in each of 5 slots, so 5 times 20 more.
6 pairs could be swapped (not sure if that counts as edit distance 1).
We are close to 200 alternatives.
Possibly 10 exist.
We can consider two alternative methods for a creation of a meaningless text using word permutations.
Your estimate for the theoretical space pretty accurate. I calculated 203 possible variants. But your ~10 alternatives is underestimating. I found ED1 variants to be 54 in the Takahashi transcription and 53 in yours. There's the top variants using ch as a single character.
| Variant |
Count |
| shedy | 434 |
| chey | 351 |
| cheey | 183 |
| chdy | 145 |
| lchedy | 115 |
| chody | 92 |
| cheody | 90 |
| cheky | 64 |
| cheedy | 59 |
| kedy | 47 |
| tedy | 40 |
| pchedy | 35 |
| tchedy | 33 |
| dchedy | 28 |
| chety | 22 |
| kchedy | 22 |
| ched | 18 |
| ychedy | 11 |
| rchedy | 10 |
| ochedy | 10 |
And they're not evenly distributed. What I'm seeing is conservative reinforcement of specific variations. So why does the manuscript repeatedly collapse into narrow local families instead of exploring the full mutation capability? To me, that is exactly what a
constrained copy/mutate process predicts. The scribe is not asking “what are all possible edits?” The scribe is asking, consciously or not, “what still looks like it belongs to this family of words?” And, I think that's exactly what that table shows. That question naturally favors a few reinforced variants and leaves most theoretical variants unused.
As for your methods A and B, I do not think they are as distinct as you suggest. A conservative copy/mutate process naturally creates a working vocabulary over time through local reinforcement. The vocabulary does not need to be fully precomputed in advance. Your edit-count argument assumes all theoretical ED1 mutations are equally attractive choices. They are not. Most mutations immediately break local family continuity and stop looking plausible. IF the goal were to create something that looked like a language, combinatorial freedom would explode it into gibberish.
Here's my Voynich generator stats for a 100 page run:
- 8248 tokens
- 2560 vocabulary items
- 1601 hapax
- 0 ledger-invalid forms
The Voynich comparison sample contained:
- 8519 tokens
- 2436 vocabulary items
- 1642 hapax
So despite enormous theoretical mutation space, the system did not explode into every possible word because it was
constrained by the ledger. And it reproduced a vocabulary and hapax rate comparable to the Voynich without specifically being coded to do so. It repeatedly collapsed into a narrow, active ecology of locally reinforced forms.
That behavior fits Method B without being instructed to do so.