The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10

(24-08-2025, 12:05 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.As far as I can tell, for the self citation you only require a couple rules:
[...]
I think that is the conceit behind the algorithm, and it is certainly not a complex process.

It seems simple in the articles by Timm & Schinner but when you look at the code of the generator:

You are not allowed to view links. Register or Login to view. has the "rules" definitions: similarElements, combinableLigature, allowedFollowerGlyphs, allowedInitialFollowerGlyphs, allowedPrecursorGlyphs, finalGlyphReplacements, selfIngroupGlyphReplacements, selfFinalGlyphReplacements, etc.

The way the generator uses these rules is also quite complex: You are not allowed to view links. Register or Login to view. (1300 lines)

The result is far from convincing Voynichese, so more rules would be required to make the generated text indistinguishable from actual VM lines at a glance. The question remains: why would anyone choose to do it the hard way, by following a large set of (mostly useless) rules, instead of doing something much simpler that is more algorithmic and less rule-based? Mostly useless because natural languages are a lot less restrictive: there is no need for so many word-building rules if the aim is only to imitate natural languages.

I am not discounting the possibility, however, that the rules can be expressed in a more elegant and compact way. In a board game like Go the patterns are emergent, not hard-coded into the rules.

(23-08-2025, 07:51 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.Why go through all the trouble of making a complex cipher if you could sell the book just as easily with it saying nothing at all?

'Just as easily to say nothing'? A book that said nothing would not be taken seriously. Also it would not be easy. If the author had written the book in the local language that everyone could read then he would have had to spend time making the text logical, readable, informative, grammatically correct with no spelling mistakes. The text would need to flow in an easy narrative and have something meaningful to say. Most probably everyone who has written academic papers, books, written a web site, even written topics for this forum knows that composing text is not always easy. You sometimes need to consult dictionaries and thesauri searching for just the right word, and debate with yourself whether to include this or that sentence and where it should go.

If the author of the VMS had some sort of method for constructing artificial text that looked like being in some foreign and unknown language, and was fluent in the use of the method then it might have been easier to do it that way.

And this is the scenario that I consider to be plausible:

The author is not a skilled writer. Is no Charles Dickens who had the ability to write at long sittings page after page without correction. He possibly was unfamiliar with the secret sciences that seem to be the subject of the VMS, and would be unable to write with authority on it. His nescience would have been immediately obvious and his book would have been immediately dismissed.

Not being able to write convincingly he had the inspiration to create a fabricated narrative in a bogus alphabet and claimed that it was a product from some distant undiscovered land. No need for any grammatical correctness or to give the text meaning. He could just bash the words out as they came. This would actually be more rewarding since a book from some unknown land would be viewed with curiosity and could command a higher premium.

(24-08-2025, 12:25 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(24-08-2025, 12:05 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.As far as I can tell, for the self citation you only require a couple rules:
[...]
I think that is the conceit behind the algorithm, and it is certainly not a complex process.

It seems simple in the articles by Timm & Schinner but when you look at the code of the generator:

You are not allowed to view links. Register or Login to view. has the "rules" definitions: similarElements, combinableLigature, allowedFollowerGlyphs, allowedInitialFollowerGlyphs, allowedPrecursorGlyphs, finalGlyphReplacements, selfIngroupGlyphReplacements, selfFinalGlyphReplacements, etc.

The way the generator uses these rules is also quite complex: You are not allowed to view links. Register or Login to view. (1300 lines)

The result is far from convincing Voynichese, so more rules would be required to make the generated text indistinguishable from actual VM lines at a glance. The question remains: why would anyone choose to do it the hard way, by following a large set of (mostly useless) rules, instead of doing something much simpler that is more algorithmic and less rule-based? Mostly useless because natural languages are a lot less restrictive: there is no need for so many word-building rules if the aim is only to imitate natural languages.

I am not discounting the possibility, however, that the rules can be expressed in a more elegant and compact way. In a board game like Go the patterns are emergent, not hard-coded into the rules.

As far as I can tell, similarElements + combinableLigature + GlyphReplacements encode adding individual strokes. The functions look more complex because they have to encode that fact in a transcription scheme that doesn't fully support this.

The line finals/line beginnings are there to support the hypothesis that typically line endings are copied from previous line endings, certainly something that would be a simple choice and very convenient for a scribe.

My point being that looking at the generator code is misleading, because a large part of this code is 'encoding' the simple rules into a rote algorithm for a computing machine to perform. They are, however, very natural for humans. To borrow your Go a perfect example: there are essentially three rules to the game of go (the rule of turn order white then black; the rule of liberty counting; the rule of ko) but a program that allows playing go will have to explicitly do the liberty counting in loops, encode the board size, encode the current state of the game, verify its correctness, display it etc etc. Most of these a human does at a glance; and it is why Go eluded computers for so long and why Go playing programs, even though they are stronger by magnitudes than the best players, still have blind spots that wouldn't fool an amateur after their second lesson (they can't do long ladders the way humans do, and this fact can be exploited to beat them by playing a very stupid game that fools the computer).

I am assuming that the description in the Timm article is accurate, but I have no indication to think otherwise. While the example .txt files are quite different than Voynichese, that is to be expected because it's made by a random process that grows in wildly different directions depending on initial seeds; still, on many key statistical features used to support the meaningfulness hypothesis, a significant proportion of the parameter space is hard to distinguish statistically from the VMS.

So, it is plausible that this scheme can generate a gibberish-but-looks-non-gibberish text with somewhat similar characteristics. Allowing for some scribal freedom (the pages that look like the first column was made first, for instance, might just be a scribe getting bored and doing something weird), I think it's a pretty convincing argument that it is a simple process that does a pretty good job of emulation of Voynichese, devisable and executable by someone in the 15th century. That doesn't mean it's how it was done, or even close, or that Voynich necessarily doesn't contain meaningful text -- but its plausibility has to be acknowledged.

(24-08-2025, 04:05 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.And this is the scenario that I consider to be plausible:

The author is not a skilled writer. Is no Charles Dickens who had the ability to write at long sittings page after page without correction. He possibly was unfamiliar with the secret sciences that seem to be the subject of the VMS, and would be unable to write with authority on it. His nescience would have been immediately obvious and his book would have been immediately dismissed.

Not being able to write convincingly he had the inspiration to create a fabricated narrative in a bogus alphabet and claimed that it was a product from some distant undiscovered land. No need for any grammatical correctness or to give the text meaning. He could just bash the words out as they came. This would actually be more rewarding since a book from some unknown land would be viewed with curiosity and could command a higher premium.

Indeed, this is plausible. And in the context of a 'quack doctor prop' theory, you don't need the craftsmanship to be that great -- you just need something that suggests vast knowledge that will pass cursory inspection. The VMS's usefulness for that is arguably incredible: we still don't know if it contains meaningful text or not, and despite crappy illustrations and no discernible use, it is viewed with curiosity centuries later. It would be a fantastic prop; less so as a book for sale, since why would you buy a book you can't read? (Of course, anything can happen, or it might be a misguided attempt at a book sale hoax etc - but my point is just that I find those explanations way less attractive than the quack doctor hypothesis)

The process to generate it has to be simple enough, but self-copying is certainly plausible.

Big manuscripts, especially illustrated ones, were expensive. It would be entirely understandable that someone who could not afford them would create one to raise his/her status as a practitioner of some more or less secret art (alchemy, herbalism, astrology). More or less because, of course, there were books in circulation, but each practitioner had their own recipes and kept the good stuff private.

You say I can't have my own book of secrets? Hold my beer. The whole family participated (some of them had terrible handwriting) and the little nephew (age 8-10) did the coloring.

(24-08-2025, 06:17 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Big manuscripts, especially illustrated ones, were expensive. It would be entirely understandable that someone who could not afford them would create one to raise his/her status as a practitioner of some more or less secret art (alchemy, herbalism, astrology). More or less because, of course, there were books in circulation, but each practitioner had their own recipes and kept the good stuff private.

You say I can't have my own book of secrets? Hold my beer. The whole family participated (some of them had terrible handwriting) and the little nephew (age 8-10) did the coloring.

Precisely. Maybe not that extreme with the nephew Big Grin

- the scribe(s) had to have access to a pretty legit library and know how to scribe, as well as have the means to get the supplies. But a quack with a few of his unscrupulous student friends at some monasterial school? Why not?

(24-08-2025, 12:05 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.As far as I can tell, for the self citation you only require a couple rules:
Create new words by either:
1) taking a previous word and changing a glyph to a similar glyph
2) add a prefix from a list of prefixes if it isn't in the word
3) concatenate existing words

These rules do not seem enough to explain the rather complex structure of the VMS words, and the irregularities in word and next-word distributions. Unless one starts with a seed text that is, say, 10'000 words long, with the proper word structure and frequency irregularities...

The concatenation of two Voynichese words is generally not a valid Voynichese word, unless the first one happens to be a prefix that is empty in the second one. In particular, there are practically no words with two or more gallows (and the handful of words that do occur may be just cases of omitted word space).

All the best, --jorge

(24-08-2025, 06:17 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Big manuscripts, especially illustrated ones, were expensive. It would be entirely understandable that someone who could not afford them would create one to raise his/her status as a practitioner of some more or less secret art (alchemy, herbalism, astrology). More or less because, of course, there were books in circulation, but each practitioner had their own recipes and kept the good stuff private.

You say I can't have my own book of secrets? Hold my beer. The whole family participated (some of them had terrible handwriting) and the little nephew (age 8-10) did the coloring.

Makes sense, but I hope you're not going to try to sell it, pretending that it is Salomon's books of wisdom ;-)

Interesting thought about f49v: the column of numbers on the folio corresponds to lines and starts at line 2. The meaning of these has been debated, and I don't think there's consensus - finding info on them is much harder.

Anyway, if we had a method where the first line is a 'seed', isn't it conceivable that these markings are related and that's why they start at line 2? Perhaps, when explaining the system to other scribes, the author made a page where the patterns he uses are more clearly evident? And this is why the initials look separated (we know line beginnings have a different glyph distribution) and weird.

What is the consensus around these numbers?

(24-08-2025, 12:25 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(24-08-2025, 12:05 PM)dexdex Wrote: You are not allowed to view links. Register or Login to view.As far as I can tell, for the self citation you only require a couple rules:
[...]
I think that is the conceit behind the algorithm, and it is certainly not a complex process.

It seems simple in the articles by Timm & Schinner but when you look at the code of the generator:

You are not allowed to view links. Register or Login to view. has the "rules" definitions: similarElements, combinableLigature, allowedFollowerGlyphs, allowedInitialFollowerGlyphs, allowedPrecursorGlyphs, finalGlyphReplacements, selfIngroupGlyphReplacements, selfFinalGlyphReplacements, etc.

The way the generator uses these rules is also quite complex: You are not allowed to view links. Register or Login to view. (1300 lines)

The result is far from convincing Voynichese, so more rules would be required to make the generated text indistinguishable from actual VM lines at a glance. The question remains: why would anyone choose to do it the hard way, by following a large set of (mostly useless) rules, instead of doing something much simpler that is more algorithmic and less rule-based? Mostly useless because natural languages are a lot less restrictive: there is no need for so many word-building rules if the aim is only to imitate natural languages.

I am not discounting the possibility, however, that the rules can be expressed in a more elegant and compact way. In a board game like Go the patterns are emergent, not hard-coded into the rules.

I think it would be interesting to sweep across a wider parameter space for this generator: randomly iterate the code’s various threshold parameters, sweep across a wide range of initializing lines, and re-run a given “seed” (initializing line + set of threshold parameters) multiple times to build up a broader representative population of generated texts. Are all these variations equally convincing, or are some more convincing than others? If there are, how does that relate to the algorithm or initializing line?

(25-08-2025, 01:45 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I think it would be interesting to sweep across a wider parameter space for this generator: randomly iterate the code’s various threshold parameters, sweep across a wide range of initializing lines, and re-run a given “seed” (initializing line + set of threshold parameters) multiple times to build up a broader representative population of generated texts. Are all these variations equally convincing, or are some more convincing than others? If there are, how does that relate to the algorithm or initializing line?

In my opinion the seed string does not matter much, if at all: what actually creates the text are the rules, not the seed string, whose effect should peter out quickly. Indeed I'd bet (but I did not try that and can't be sure) that starting from a null string will create a text with the same statistical properties as starting from a proper seed string.

One thing which could be interesting to do is to compare the copy&modify algorithm with a version of it without the copy&modify part, but with the same generation rules. The idea is to generate one word at time, always starting from a null string and applying the rules a random number of times (a few times) for each generated words (one needs also a mechanism for creating 'separable' words: at each step generate two words and decide with a certain probability if to add the 'join two words' rule, else keep only the first word). This could help in separating the effect of the copy&modify mechanism from the effect of the rules.

Pages: 1 2 3 4 5 6 7 8 9 10

nablator

dashstofsk

dexdex

nablator

dexdex

Jorge_Stolfi

ReneZ

dexdex

magnesium

Mauro