A One-Page Ledger Method for Generating Voynich-Like Text

A One-Page Ledger Method for Generating Voynich-Like Text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: A One-Page Ledger Method for Generating Voynich-Like Text (/thread-5752.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11

RE: A One-Page Ledger Method for Generating Voynich-Like Text - oshfdk - 21-05-2026

(21-05-2026, 01:17 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.No, the manuscript never explored every possible word combination. A realistic copy-mutate system would stay conservative, reusing active word families and nearby variants instead of wandering randomly through all legal forms.

And how was this achieved? For example, chol and chedy are two very common words and olchedy appears more than 70 times, what exactly happens when the scribes attempt to merge chol and chedy together into cholchedy? What stops them? Why cholchedy only occurs twice and qolchedy 11 times, even though chol is almost 3 times as popular as qol?

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Jorge_Stolfi - 21-05-2026

(21-05-2026, 08:43 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.The frequency-connectivity correlation arises through a feedback loop inherent in the copying process. Frequent words are more likely to be selected as copying templates, generating more variants; the existence of more variants increases the probability that members of that word family are selected in subsequent copying events. This self-reinforcing cycle ensures that the most frequently used words accumulate the most similar neighbors—precisely the pattern you document.

Sorry, I don't understand this argument.

Take for example
56 otedy 56 oteedy
2 ytedy 12 yteedy
If, after a suitable warm-up period, the words otedy and oteedy are equally frequent (as shown), and the mutation process can create ytedy from otedy, it should also create yteedy from oteedy. Then ytedy and yteedy should be equally frequent too. But their ratio is only 1:6.

As I see it, the only ways your model would create the above counts are (1) the mutation of the prefix o->y is sensitive to whether the suffix is edy or eedy, or vice-versa, or (2) the seed text had those four words in those approximate skewed ratios (maybe no ytedy at all), and the mutation rules cannot create enough ytedy from otedy or from yteedy to raise the ytedy:yteedy ratio above 1:6. Isn't that so?

If (1) is the case, then the method is even more complicated than it seemed at first.

If (2) is the case, then the method must be relying a lot more on the seed text being "Voynichese-like". Which essentially replaces the question "how could the Author have generated the VMS text" to "how could the Author have generated a seed text with the same vocabulary and word frequencies as as the VMS text".

No?

All the best, --stolfi

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Jorge_Stolfi - 21-05-2026

(19-05-2026, 07:32 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.This is my hypothesis also. The writer wanted to make people believe that this was a rare piece from some distant land where they used a strange alphabet. In order to create the deception he had to give the manuscript a semblance of genuineness, give it some definite structure. Otherwise unstructured random writing would have been quickly dismissed as a fraud.

The hoax/fraud theories have several problems.

First, even if we assume that the Author used old vellum, the VMS must have been written before 1600. That was decades before probability theory was invented, and centuries before it was applied to languages. The Author would not have even imagined what the phrase "like natural language" meant. Even computing character frequencies would have been a hard and unusual task. So how come the output of his gibberish generation method ended up having all the properties of natural languages that we see in Voynichese -- lexicon size, Zipf-like word frequencies, 10 bits of word entropy, etc?

And the same applies to the intended targets of the fraud. How could they tell that the text was not "like a natural language"? So then why would the Author worry about that?

On the other hand, if his intent was to make it look "natural", why did he make Voynichese so unlike European languages? Even today, when we know that natural languages can be very different from European ones, people will look at those very short words with complex structure, the absence of recognizable articles and prepositions,  the repetitiousness -- and immediately think "gibberish" and "fraud".

Moreover, people rarely imagine things beyond their experience. An invented language will usually be similar to the languages that the Author knows, even if he tries hard to make it "exotic". If all the languages that someone knows have polysyllabic words, articles, prepositions, inflections -- his invented language will quite probably have them too.   Edward Kelley, the con man who ruined the life of John Dee, invented an "Enochian language" that was supposedly used by angels in Heaven. Kelley's "Enochian" turned out to be quite similar to Greek, Hebrew,and other languages that he was familiar with. Thus, if the Voynichese language had been invented by a Medieval European scholar, it should look like that, too.

And then there is the question, why would the Author create a book like this? Whether the intent was to sell the book or to impress, he surely would have produced a very different book -- with more figures and less text, with figures suggesting more valuable secrets like turning lead into gold, curing someone from an arrow shot straight though the head, restoring the youth of an elderly person, getting twelve nymphs into one's bed...

Why would he waste time with the ~25 pages of Quire 20, with boring-looking text and no figures?

Why would he have so many plants in the herbal sections? If the "selling point" of the book was that the plants were from a distant land and unknown in Europe, it would not make much difference if the book had 50 or 30 unknown plants, instead of 130+.

Quote:For me the actual method is not really important. It is enough to show evidence that the manuscript is artificial, constructed and meaningless.

The method is important if it implies knowledge that was not generally available at the time, or that it was too complicated and cumbersome to use.

All the best, --stolfi

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 21-05-2026

(21-05-2026, 03:15 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 01:17 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.No, the manuscript never explored every possible word combination. A realistic copy-mutate system would stay conservative, reusing active word families and nearby variants instead of wandering randomly through all legal forms.

And how was this achieved? For example, chol and chedy are two very common words and olchedy appears more than 70 times, what exactly happens when the scribes attempt to merge chol and chedy together into cholchedy? What stops them? Why cholchedy only occurs twice and qolchedy 11 times, even though chol is almost 3 times as popular as qol?

I believe Timm's work suggests merging words. Mine does not. My model is much more conservative and is based primarily on simple edit-distance-one mutations accumulating over time.
Furthermore, you're looking at Scribe 2 and the ed bigram. My own work so far has focused mainly on Scribe 1. I do not yet have a satisfactory answer for how the <ho>-heavy regime shifted into the later <ed>-heavy regime.

But to try to answer your question: the system is not exploring all legal combinations equally. It is heavily path-dependent. The scribes are mostly copying and lightly mutating whatever word families are currently active on the sheet. So even though chol and chedy are both common, that does not make cholchedy automatically probable just because it is technically legal.
These longer forms seem to develop their own local ecology. Once a form like qolchedy becomes established in a section or family cluster, nearby mutations tend to reinforce that branch rather than generating every other possible combination. The manuscript behaves much more like a conservative drifting network than a system sampling all possibilities.

Quite simply, in human terms: does the word “look right” within the current local family of forms being copied?

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Jorge_Stolfi - 21-05-2026

(21-05-2026, 02:12 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I reworked one of my comparison tests to operate at the sheet level. What this appears to show is that the manuscript is not behaving like a collection of unrelated pages. Certain sheets are much more similar to specific other sheets than to the manuscript as a whole, especially in Quire 13 and Quire 20.

I won't comment about other quires, bu the numbers for Quire 20 are just what is expected from

vocabulary and word frequencies determined by topic (that is, high correlation within the quire, low correlation with other quires) and
text consisting of a collection of independent paragraphs whose order depends only slightly on their contents (that is, about the same correlation between any two sheets within Quire 20).

Quote:That is consistent with what I would expect from a copy-and-modify process, where words and word families are repeatedly reused and gradually altered over time, creating clusters of closely related sheets instead of uniformly random text.

On the contrary, the copy-and-modify process should create a gradual transition across quires and sections.

Maybe there is a reordering of the sheets that will make the transitions between sections less abrupt. But that would be an artifact of the reordering, not a real gradual change.

All the best, --stolfi

RE: A One-Page Ledger Method for Generating Voynich-Like Text - oshfdk - 21-05-2026

(21-05-2026, 04:01 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Quite simply, in human terms: does the word “look right” within the current local family of forms being copied?

The whole idea of the copy-mutate system, if I understood @Torsten correctly, is to formalize the process, to be able to make testable predictions, etc. Otherwise we can just say "scribes were writing this way just because they were asked to write something like this or just because they felt it looked right" and that's it. Essentially, after using this escape hatch the copy-mutate system becomes the same as the glossolalia hypothesis, automatic writing and similar. I suppose even without physically looking at past pages, many people have very good visual memory and even spontaneous meaningless writing may repeat past patterns easily, so in a sense all kinds of pattern following include copy+mutate part.

I see two possibilities for where copy+mutate research leads: either some well defined rule based reasonably simple copy+mutate system successfully explains most statistical properties of the text, which would mean that the scribes actually used something similar to this precise rule-based system. Or scribes mostly relied on their intuition, visual patterns and imagination and even if the process included looking back and reusing past patterns, there will be no way of identifying the process exactly, in which case we can't really learn much from copy+mutate formalizations. Is this correct?

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 21-05-2026

(21-05-2026, 03:48 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The hoax/fraud theories have several problems.

First, even if we assume that the Author used old vellum, the VMS must have been written before 1600. That was decades before probability theory was invented, and centuries before it was applied to languages. The Author would not have even imagined what the phrase "like natural language" meant. Even computing character frequencies would have been a hard and unusual task. So how come the output of his gibberish generation method ended up having all the properties of natural languages that we see in Voynichese -- lexicon size, Zipf-like word frequencies, 10 bits of word entropy, etc?

And the same applies to the intended targets of the fraud. How could they tell that the text was not "like a natural language"? So then why would the Author worry about that?

On the other hand, if his intent was to make it look "natural", why did he make Voynichese so unlike European languages? Even today, when we know that natural languages can be very different from European ones, people will look at those very short words with complex structure, the absence of recognizable articles and prepositions,  the repetitiousness -- and immediately think "gibberish" and "fraud".

Moreover, people rarely imagine things beyond their experience. An invented language will usually be similar to the languages that the Author knows, even if he tries hard to make it "exotic". If all the languages that someone knows have polysyllabic words, articles, prepositions, inflections -- his invented language will quite probably have them too.   Edward Kelley, the con man who ruined the life of John Dee, invented an "Enochian language" that was supposedly used by angels in Heaven. Kelley's "Enochian" turned out to be quite similar to Greek, Hebrew,and other languages that he was familiar with. Thus, if the Voynichese language had been invented by a Medieval European scholar, it should look like that, too.

And then there is the question, why would the Author create a book like this? Whether the intent was to sell the book or to impress, he surely would have produced a very different book -- with more figures and less text, with figures suggesting more valuable secrets like turning lead into gold, curing someone from an arrow shot straight though the head, restoring the youth of an elderly person, getting twelve nymphs into one's bed...

Why would he waste time with the ~25 pages of Quire 20, with boring-looking text and no figures?

Why would he have so many plants in the herbal sections? If the "selling point" of the book was that the plants were from a distant land and unknown in Europe, it would not make much difference if the book had 50 or 30 unknown plants, instead of 130+.

Quote:For me the actual method is not really important. It is enough to show evidence that the manuscript is artificial, constructed and meaningless.

The method is important if it implies knowledge that was not generally available at the time, or that it was too complicated and cumbersome to use.

All the best, --stolfi

“They didn’t know statistics.” - They did not need formal probability theory. Repetitive human copy-mutate behavior can naturally generate language-like statistical structure without mathematical understanding (again, You are not allowed to view links. Register or Login to view.).
“Why fool people with language-like properties if nobody could measure them?” - Medieval cryptographers absolutely knew what ordinary ciphers looked like, and Voynichese does not resemble a normal substitution cipher or shorthand system. The goal may have been plausibility without easy detection as either plain language or conventional cipher.
“Why make it unlike European languages?” - Partially true. Again, plausibility without ease of detection may have been the goal. The system may have intentionally avoided looking too much like a known European language while still maintaining internal textual consistency.
“Invented languages usually resemble known languages.” - Agreed, and this is actually a strong objection to the ‘fully invented spoken language’ theory. Voynichese may be better explained as a generative text system than as a consciously designed language like Enochian.
“Why make such a huge, labor-intensive manuscript?” - Labor-intensive is relative. The illustrations are fairly simplistic and even the coloring appears hastily applied compared to true luxury manuscripts. If the author had an efficient copy-mutate method, the manuscript may have been producible within a few months — a reasonable investment for possible long-term patronage.
“Why so much boring text like Quire 20?” - Plausibility. A long dense text section makes the manuscript feel more like a serious scholarly work than a collection of decorative illustrations.
“Why so many herbal pages?” - Again, plausibility. A large encyclopedic structure increases perceived authenticity and makes the work appear systematic and authoritative rather than small and improvised.

Again, I am not saying it IS a hoax. But, I will say it's plausible. People love the mysterious. It's one of the reasons we're all here. Putting Dee and Kelly aside, Johannes Kepler was a BRILLIANT astronomer during the Renaissance who studied planetary motion. But, to pay the bills he resorted to Astrology. He was even skeptical of parts of it, yet still practiced it professionally because patronage systems rewarded it. He cast horoscopes for patrons and nobility because mathematics and astronomy alone generally did not provide a stable income.

Here's the simple answer. Look at the lengths to which people TODAY go to in order to scam others out of money. I cannot believe that the 15th century was any different in that regard. And that alone tells me that hoax is VERY plausible. A copy/mutate method could absolutely have accomplished that goal. And if this is the system, then it worked spectacularly. Six hundred years later, we're still arguing about it... and in many ways, still buying into it.

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 21-05-2026

(21-05-2026, 04:36 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 04:01 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Quite simply, in human terms: does the word “look right” within the current local family of forms being copied?

The whole idea of the copy-mutate system, if I understood @Torsten correctly, is to formalize the process, to be able to make testable predictions, etc.

I see two possibilities for where copy+mutate research leads: either some well defined rule based reasonably simple copy+mutate system successfully explains most statistical properties of the text, which would mean that the scribes actually used something similar to this precise rule-based system system. Or scribes mostly relied on their intuition, visual patterns and imagination and even if the process included looking back and reusing past patterns, there will be no way of identifying the process exactly, in which case we can't really learn much from copy+mutate formalizations. Is this correct?

Correct. The Voynich demands an explanation. Any explanation without empirical evidence is not an explanation. Go look at the theories and solutions threads or Youtube. You'll find PLEANTY of explanations that lack testable proof. The scientific method is at play here. If you're going to make bold statements about the Voynich, you need to back it up with evidence that anyone can examine and get the same results. The testable predictions. So even if it is copy/mutate, there was likely a system that prevented drift and should be testable. Which is what I think I've found.

If it is copy/mutate then perhaps we could identify the process exactly. We could likely never reproduce the Voynich exactly and even if we could reproduce it, it would get tossed out pretty quick as being TOO good.

And we can learn a lot by looking at how the text was put together. This is a mystery that has lasted for centuries and even been tackled by 'super computers' and learning models. If we can figure out the method, I believe the biggest gains will be in, not just linguistics and how languages are detected, but in human psychology and how our eyeballs managed to fool some of us into thinking it was a coloring book dropped by an alien baby who's parents were on vacation to Earth.

RE: A One-Page Ledger Method for Generating Voynich-Like Text - Dunsel - 21-05-2026

(21-05-2026, 04:01 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.On the contrary, the copy-and-modify process should create a gradual transition across quires and sections.

Maybe there is a reordering of the sheets that will make the transitions between sections less abrupt. But that would be an artifact of the reordering, not a real gradual change.

All the best, --stolfi

This is the closest chart I have to a less abrupt transition and you may remember it from my work on <ed>. It's showing the switch between scribe 1 and <ho> and scribe 2 and <ed>. And yes, the pages are reordered but, the background is shaded according to Currier A and B. So there is a gradual change but...

This is not proof in itself of that change but right now, it's the one piece of evidence I can put forward.

Filename: output (4).png Size: 99.42 KB 21-05-2026, 05:11 PM

RE: A One-Page Ledger Method for Generating Voynich-Like Text - dashstofsk - 21-05-2026

(21-05-2026, 03:48 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.why did he make Voynichese so unlike European languages?

Easy.

The scenario I can imagine is that perhaps some similar manuscript in the Arabic script was sold to someone who did not understand the text but who had the luxury and self importance to want to buy and hold such a manuscript, a rarity because it was not in a European language, and who was happy to pay a premium for it. Perhaps this purchase then came to the attention of some opportunist who then had the idea to ‘invent’ a manuscript, claim it to be a work from some distant land, with the intention of offering it to some similar person of importance for a similar premium.

A fabricated manuscript in a European alphabet would be less likely to succeed. People would try to 'read' the manuscript by trying to pronounce the words using European letter sounds and the nonsensical babble would make them suspicious. After all, the Latin alphabet was widespread. But with a foreign alphabet there would be uncertainty to the correct pronunciation and so would less likely be dismissed as a fraud.