Options

A One-Page Ledger Method for Generating Voynich-Like Text

Index
A One-Page Ledger Method for Generating Voynich-Like Text
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 01:47 PM

(22-05-2026, 09:56 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Anyone intending to create a hoax manuscript would have done it in secret and not in some manuscript workshop factory. Under the hoax hypothesis the writer was not bound by any contract of work, was under no obligation to do it quickly, or indeed to do any of it at all. He wrote as much or as little as he wanted, whenever he wanted. Perhaps also only being able to do it in his spare time.

Section by section over a length of time is reasonable and can explain the different 'topics', handwriting styles, language clusters, quality of drawings.

That is one possibility. Let me offer you another.

There are entire workshops running right now that are designed with the single purpose of scamming people out of money. It would be a huge surprise if you haven't gotten those phone calls or seen those "your computer is infected, call us now" popups.

Image a small group of 'scribes', and I use that word loosely, who got together to create a hoax. One they could simply sell and not be traced back to them. Imagine this workshop thinking they could create collections of these books to sell. As Jorge pointed out, there were books of every language floating around at that time. But, they were written in a language. Rudolph II was famous for his interest in the occult. Though he's later in the book's proposed history, his supposed purchase of it implies it's value. One small group to write the book and one charming salesman to promote it. At that time, a few dozen ducats was an annual wage. 600 ducats? Around $500,000 in US currency today. Now, if you were this small group and you could slap a book like the Voynich together in a month or two (and let's be honest, the Voynich is no monastic marvel), would it be worth it? Even just one book, one sale? Would you do it now? Watch Youtube. Plenty are trying to do that as we speak.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 01:57 PM
(22-05-2026, 12:10 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 05:13 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.This is the closest chart I have to a less abrupt transition and you may remember it from my work on <ed>. It's showing the switch between scribe 1 and <ho> and scribe 2 and <ed>. And yes, the pages are reordered but, the background is shaded according to Currier A and B. So there is a gradual change but...

Thanks for the chart. But suppose that there are two sets of pages: Set A where ed is never used and the frequency of ho varies randomly and uniformly between 0.05 and 0.55, and set B where the frequency if ho varies randomly and uniformly between 0.05 and 0.10, while that of ed varies randomly and uniformly between 0.00 and 0.40.

If you plot the frequencies with the pages in some random order, but with all A pages first then the B pages, what you will see is just that: two different random distributions.

But if you sort the pages by decreasing frequency of ho or of the ratio ho/ed, the plot would look just like your plot. You will see a gradual transition from (ho = 0.55, ed = 0.00) to (ho = 0.05, ed = 0.40). A transition that does not exist in the data, and was created wholly by the sorting.

All the best, --stolfi

That is fair criticism of the visualization itself, and yes, sorting can absolutely create the appearance of continuity between otherwise separate distributions.
But my argument is not based solely on the shape of that chart. The important point is that the <ho> and <ed> regimes also correlate strongly with other manuscript structures:
- Currier A/B,
- Davis scribes,
- section clustering,
- vocabulary ecology,
- and source-page behavior in the analyzer.
Notice how Currier A is heavy on the left of the chart where <ho> dominates and Currier B is on the right side where <ed> dominates. That chart is not just measuring <ed> and <ho>. The same separation keeps reappearing through multiple unrelated measurements.

So I agree the chart alone cannot prove gradual transition. But I also do not think the broader pattern reduces to a visualization artifact created by sorting.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

oshfdk > 22-05-2026, 02:45 PM

(22-05-2026, 01:31 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.The system is trying to maintain textual plausibility and family continuity, not maximize combinatorial exploration. The Voynich appears to behave similarly. The existence of hundreds of theoretical edit possibilities does not mean a human production process would actually use them.

I'm not sure I understand how would human production achieve this? It is trivial on a computer that can process the whole text up to the present moment, compute counts, check presence, etc. Humans can only physically look at past text and notice some words that are present there.

Could you implement the generation algorithm with the following restriction:

When generating a new word the only available source of information is up to 20 word tokens from the past text, chosen by any algorithm (just the preceding 20 word tokens, random 20 word tokens, pairs of adjacent tokens, it's up to you). The full list of previously generated words is not accessible, no statistics of the past text are accessible, no presence checks (has this word ever occurred in the text before), because these appear unrealistic for a human scribe, I think 20 is quite generous here. So, the function that generates a new word can use any kind of rules and some reasonably sized internal immutable data that the scribes could learn by heart or have as a reminder in front of them, but the only state input is a list of 20 word tokens. I think this much more realistically represents a real human being attempting to continue a text using a past sample and copy+mutate.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Jorge_Stolfi > 22-05-2026, 03:00 PM

(21-05-2026, 04:49 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.“They didn’t know statistics.” - They did not need formal probability theory. Repetitive human copy-mutate behavior can naturally generate language-like statistical structure without mathematical understanding (again, You are not allowed to view links. Register or Login to view.).

Quote:"Why fool people with language-like properties if nobody could measure them?” - Medieval cryptographers absolutely knew what ordinary ciphers looked like, and Voynichese does not resemble a normal substitution cipher or shorthand system. The goal may have been plausibility without easy detection as either plain language or conventional cipher.

AFAIK 1420-Medieval is quite different than 1520-Medieval on this point. Cryptography became a "science" on its own only around 1500.

But that does not matter much. Was the VMS meant to look like cipher, or an exotic natural language in the plain?

If it was meant to look like cipher, there would be no need to worry about word frequencies, word structures, etc. Any random sequence of glyphs would do.

If it was meant to look like an exotic natural language in the plain, why did the Author bother to make the hidden statistics seem natural, while making the most visible features -- the complex and rigid word structure and the repetitiousness -- look so un-natural?

Quote:People love the mysterious. It's one of the reasons we're all here.

But the VMS did not look particularly valuable for most of its documented history. Baresch was somewhat obsessed with it, but none of the people around him mentioned the VMS, until they were prompted by Baresch to write to Kircher about it. Then Kircher apparently did nothing with it, and for ~200 years it sat in the library of the Collegio Romano / PUG, where it did not seem to have attracted any special attention (except perhaps some ketchup stains). It became famous in the 20th century only because Wilfrid tried to sell it as a Bacon original.

Quote:Look at the lengths to which people TODAY go to in order to scam others out of money. I cannot believe that the 15th century was any different in that regard.

That is true, and surely there were many fraudulent manuscripts being made and sold back then. The now-called You are not allowed to view links. Register or Login to view. could be considered an instance of that.

But forgeries have a logic. To make the investment and risk worthwhile, the forger will try to maximize the attractiveness of the work to potential buyers. A painting forger will try to produce a fake that looks like a Rembrandt, a Pollock, a Van Gogh, or at least the work of some minor known painter; not the work of a pensioner who just watched a couple of Bob Ross videos. A book forger will go for Hitler's diaries, not for the ledger of a lumber shop in Peoria. A document forger will try the map of Vinland, not a sketch of the street plan of a suburb of Kotěhůlky.

And the VMS does not look at all like that. Its 240 pages are 240 missed opportunities to make the work more appealing to the alleged targets of the scam -- whether they were rich nobles, scholars, physicians, alchemists...

All the best, --stolfi
RE: A One-Page Ledger Method for Generating Voynich-Like Text

dashstofsk > 22-05-2026, 03:15 PM

(22-05-2026, 01:47 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... this workshop thinking they could create collections of these books to sell.
... One small group to write the book and one charming salesman to promote it.

Somehow I cannot see that the trade in books and manuscripts was as quick and easy as it is today. I cannot see that there would have been a big market for such manuscripts. Not enough to justify a team of opportunists occupying themselves full time on the sharp practice.

I very strongly believe that the VMS is a collection of individual pieces of work, written section by section over a length of time; the reward from each section providing the enthusiasm to write the next one. It seems so logical. It just would have been too risky to try to write the whole manuscript not knowing if you would be able to find a buyer for it.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 03:38 PM
(22-05-2026, 02:45 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand how would human production achieve this? It is trivial on a computer that can process the whole text up to the present moment, compute counts, check presence, etc. Humans can only physically look at past text and notice some words that are present there.

I think you're underestimating what a human can do and overestimating what a computer can do. As I mentioned before, I've had the sheet source theory lined out for months. It's taken that long and 3600+ lines of python code to even come close to modelling human behavior without cheating. When I missspell a word, you can spot it easily. A computer needs lookup tables and algorithms. Humans have a built-in pattern recognition systems that are not easy to put into mathematical terms.

Let's look at some of the things I've had to consider and would also have to be considered in your model.
- Current line position - Where are we on the line? Do we need a gallows token here?
- Current page position - Do we need to create a new paragraph? Does this need to be a short line to precede a paragraph on the next line?
- Selected source token - Which of those 20 words are we going to choose? Or, are we going to choose a word on the page we just wrote?
- Selected mutation type - Are we going to copy the word or mutate it? We need a probability.
- Copy vs mutate probability - What percentage of words are copied and which are modified?
- Glyph insertion/deletion weights - When a word is chosen to modify, what letter can be deleted? What letter can be inserted or swapped for another.
- Length-preservation bias - When we mutate, do we need to keep the same word length or can we shorten or lengthen it?
- Repeat-nearby-family bias - Do we need to make this word look like another word we just wrote down?
- Avoid-too-many-new-forms bias - Have we used a word and it's mutation too many times already. Do we need to find a new word?
- Proposed token - Ok, we have chosen the word and we're ready to write it.
- Rejection reason - Does the word look stupid? Does it have 4 vowels? 5 consonants? Have we repeated the word 4 times? Does it look like a possible word with vowels and consonants? Accept or reject and start over.
All of those rules, you could look at the page and know almost instantly how to proceed. Software needs numbers. Weights, probabilities. So could your proposed method be modelled? Yes. Would this be easy? Oh, I can say for certain it would not.

Now, that being said, I have looked at other models. One that often proves just as valid as my sheet model, and sometimes more so, is the previous few pages model.

Here's a table from data in an earlier attempt to locate the 'parents' of words. You can easily see that most words can be sourced from the same page. Then, just looking a few pages back, you can find sources for other words. But, I rejected this method as it would require the scribe to keep flipping sheets over to look at the back side to see what word to copy or mutate and ED2 can change 50% of the letters in a 4 letter word which seemed unrealistic to me and bordered on making up results to fit the theory. And that's when I came up with the sheet model which is much more human doable and, for the most part, uses ED1 and copy.

Total tokens on f20r: 78

Source Window Copy ED1 ED2 Any Match

Same page 11 25 25 61

1 page back 3 4 5 12

2 pages back 0 0 2 2

3 pages back 0 0 0 0

4 pages back 1 0 0 1

5 pages back 0 0 1 1

Far lookback 0 1 0 1

So, are there other explanations and could yours be one of them? Definitely. Am I going to spend another 3 months modelling it? No offense but uh, no. I would need to see some concrete numbers before even considering it as a possibility.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 04:18 PM

(22-05-2026, 03:15 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(22-05-2026, 01:47 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.... this workshop thinking they could create collections of these books to sell.
... One small group to write the book and one charming salesman to promote it.

Somehow I cannot see that the trade in books and manuscripts was as quick and easy as it is today. I cannot see that there would have been a big market for such manuscripts. Not enough to justify a team of opportunists occupying themselves full time on the sharp practice.

I very strongly believe that the VMS is a collection of individual pieces of work, written section by section over a length of time; the reward from each section providing the enthusiasm to write the next one. It seems so logical. It just would have been too risky to try to write the whole manuscript not knowing if you would be able to find a buyer for it.

Oh I think there was plenty of justification. The late medieval and early Renaissance world already had an active manuscript trade, universities, wealthy collectors, courts, monasteries, and professional workshops. Finding a buyer for an unusual manuscript was not some unimaginable task. And the Voynich itself does not appear to be an ultra-luxury production. The parchment is fairly ordinary, there is no gold leaf, and if current theories about the original format are correct, it may not even have been formally bound at first, but simply loose quires wrapped in leather.

So imagine you already have the method. You commission a workshop to produce it. The scribes do not even need to understand the content, only the production process (which could explain Currier A and B). Perhaps you handle the illustrations yourself to reduce costs further. Even if production and travel together consumed months, a successful sale to a wealthy collector or court could still make the venture very worthwhile.

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

I'm not saying you're wrong and I'm right. I'm just saying there are other possibilities that I see just as likely.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

oshfdk > 22-05-2026, 04:20 PM
(22-05-2026, 03:38 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Let's look at some of the things I've had to consider and would also have to be considered in your model.
- Current line position - Where are we on the line? Do we need a gallows token here?
- Current page position - Do we need to create a new paragraph? Does this need to be a short line to precede a paragraph on the next line?
- Selected source token - Which of those 20 words are we going to choose? Or, are we going to choose a word on the page we just wrote?
- Selected mutation type - Are we going to copy the word or mutate it? We need a probability.
- Copy vs mutate probability - What percentage of words are copied and which are modified?
- Glyph insertion/deletion weights - When a word is chosen to modify, what letter can be deleted? What letter can be inserted or swapped for another.
- Length-preservation bias - When we mutate, do we need to keep the same word length or can we shorten or lengthen it?
- Repeat-nearby-family bias - Do we need to make this word look like another word we just wrote down?
- Avoid-too-many-new-forms bias - Have we used a word and it's mutation too many times already. Do we need to find a new word?
- Proposed token - Ok, we have chosen the word and we're ready to write it.
- Rejection reason - Does the word look stupid? Does it have 4 vowels? 5 consonants? Have we repeated the word 4 times? Does it look like a possible word with vowels and consonants? Accept or reject and start over.
All of those rules, you could look at the page and know almost instantly how to proceed. Software needs numbers. Weights, probabilities. So could your proposed method be modelled? Yes. Would this be easy? Oh, I can say for certain it would not.
Ok, we can add the line/page relative/absolute position, this sounds reasonable.

As for the rest of the list, mostly it's ok, except the generic "Does the word look stupid?" and also "Have we used a word and it's mutation too many times already." - absolutely not. I don't think we can expect the scribes to keep the tally of all the words they have written. If some word occurs 2-3 times in the sample of 20 tokens, it can be excluded of course, otherwise it's unreasonable to expect the scribe to keep the count of all the tokens used.

Other than that, the rest of the things you mentioned can be easily implemented in the function, as the restriction allowed for reasonable size of immutable data, like a table of probabilities. If the choice function includes the past 4 tokens, the repetitions can be checked of course, but these should definitely come from the budget of 20 tokens. We can't expect the scribe to scan the manuscript for any extended time before writing each word.

If the method doesn't realistically model the limitations that a human scribe would face, I don't think it can be used as an argument for or against anything related to the actual MS. There are way too many possible methods to generate something similar to Voynichese.
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 05:02 PM

(22-05-2026, 03:00 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But forgeries have a logic. To make the investment and risk worthwhile, the forger will try to maximize the attractiveness of the work to potential buyers. A painting forger will try to produce a fake that looks like a Rembrandt, a Pollock, a Van Gogh, or at least the work of some minor known painter; not the work of a pensioner who just watched a couple of Bob Ross videos....

And the VMS does not look at all like that. Its 240 pages are 240 missed opportunities to make the work more appealing to the alleged targets of the scam -- whether they were rich nobles, scholars, physicians, alchemists...

Jorge, you need to surface that sense of humor more often. You had me laughing at the Bob Ross analogy.

And, I could come up with some plausible but weak arguments debating your position but, honestly, I think you are correct. I have for many months had the internal argument with myself that nobody writes 30,000+ words of just gibberish, even as a hoax. I still have this hope, like I think most people have, that it has some meaning. But right now, I'm simply going where the numbers are leading me. And everything I keep seeing or discovering just adds to the copy/mutate theory. I'm secretly hoping that there is some mnemonic buried inside of it and I'm still actively searching for that evidence. So far, it's been typical Voynich elusive.

But, until I, or someone else can come up with a better solution, this one seems to work. It is not the explanation I was hoping for. And, I've had to "adapt" my theories about why it exists and how it came into existence to match what I'm finding. So, is the hoax theory plausible? I'd say yes. But, as you have pointed out, it's weakly plausible and fails to explain a lot.

Thanks for bashing me over the head with Bob.

But, you've inspired me with a new theory. The Voynich is just a, "happy little accident."
RE: A One-Page Ledger Method for Generating Voynich-Like Text

Dunsel > 22-05-2026, 05:07 PM
(22-05-2026, 04:20 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(22-05-2026, 03:38 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Let's look at some of the things I've had to consider and would also have to be considered in your model.

Current line position - Where are we on the line? Do we need a gallows token here?

Current page position - Do we need to create a new paragraph? Does this need to be a short line to precede a paragraph on the next line?

Selected source token - Which of those 20 words are we going to choose? Or, are we going to choose a word on the page we just wrote?

Selected mutation type - Are we going to copy the word or mutate it? We need a probability.

Copy vs mutate probability - What percentage of words are copied and which are modified?

Glyph insertion/deletion weights - When a word is chosen to modify, what letter can be deleted? What letter can be inserted or swapped for another.

Length-preservation bias - When we mutate, do we need to keep the same word length or can we shorten or lengthen it?

Repeat-nearby-family bias - Do we need to make this word look like another word we just wrote down?

Avoid-too-many-new-forms bias - Have we used a word and it's mutation too many times already. Do we need to find a new word?

Proposed token - Ok, we have chosen the word and we're ready to write it.

Rejection reason - Does the word look stupid? Does it have 4 vowels? 5 consonants? Have we repeated the word 4 times? Does it look like a possible word with vowels and consonants? Accept or reject and start over.

All of those rules, you could look at the page and know almost instantly how to proceed. Software needs numbers. Weights, probabilities. So could your proposed method be modelled? Yes. Would this be easy? Oh, I can say for certain it would not.
Ok, we can add the line/page relative/absolute position, this sounds reasonable.

As for the rest of the list, mostly it's ok, except the generic "Does the word look stupid?" and also "Have we used a word and it's mutation too many times already." - absolutely not. I don't think we can expect the scribes to keep the tally of all the words they have written. If some word occurs 2-3 times in the sample of 20 tokens, it can be excluded of course, otherwise it's unreasonable to expect the scribe to keep the count of all the tokens used.

No, I meeeeeeannnnnnnnnnnn youu youu youu youu youu need to youu need to make surrreeee it doesn't look stupid. A 'method' I have of filtering out bad words. And trust me, you don't do that and you will get garbage. My method is a 5 word moving window. It looks at the last 5 words to make sure it's not doing too many duplicate words. It looks at consonant and vowel positions in the words to make sure too many letters or vowels or consonants are not being repeated. The Voynich is surprisingly pronouncable because it does have letters that act and look like vowels and consonants. Again, these are all human doable. You can't just tell a piece of code to not make these words look stupid. It needs numbers.
Next Oldest Next Newest

Source Window	Copy	ED1	ED2	Any Match
Same page	11	25	25	61
1 page back	3	4	5	12
2 pages back	0	0	2	2
3 pages back	0	0	0	0
4 pages back	1	0	0	1
5 pages back	0	0	1	1
Far lookback	0	1	0	1

A One-Page Ledger Method for Generating Voynich-Like Text

Index

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text

RE: A One-Page Ledger Method for Generating Voynich-Like Text