[Movie] This Famous Medieval Book May Be a Hoax

[Movie] This Famous Medieval Book May Be a Hoax - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Movie] This Famous Medieval Book May Be a Hoax (/thread-4761.html)

Pages: 1 2 3 4 5 6

RE: This Famous Medieval Book May Be a Hoax - magnesium - 24-06-2025

I will be presenting more on this at the upcoming Voynich Manuscript Day, but many of the properties that Torsten Timm and Jorge Stolfi discuss within this thread can be reliably achieved via a substitution cipher of the right construction encrypting a natural language like Latin. However, in my opinion, the biggest outstanding challenge the ciphertext hypothesis faces is a mechanism that often, reliably generates and/or preserves the text's long-range correlations during encryption.

There are multiple hypothetical ways in which these correlations could be induced. For example, Matlach, Janečková, and Dostál (2022) suggests that simple habit and regular line-by-line reuse of glyphs within a proposed steganographic cipher could induce correlations over long ranges. If the cipher itself permits glyph or word reuse, then correlations become easier to achieve. You are not allowed to view links. Register or Login to view.

It's also worth noting that the construction of the VMS may itself help induce these correlations. The VMS is a series of quires made, in turn, of nested bifolia. If the plaintext contents of the VMS were roughly laid out ahead of time, and then encryption (unusually) proceeded on the basis of the bifolio—not in sequential, narrative page order—then the first and last pages of a given quire would be encrypted closer in time together than the first and second pages of the quire. If patterns of non-random word reuse were established during the encryption of a given bifolio, that could then mean that there is non-random correlation in word choice between the first and last pages of a quire, potentially thousands of words apart, and so on and so forth for the 4 pages within a given quire that are part of the same shared bifolio.

All that said, if the VMS is a ciphertext, its observed long-range correlations would be much easier to explain if the plaintext itself had meaningful long-range correlations, even if the encryption procedure itself can induce them on its own.

RE: This Famous Medieval Book May Be a Hoax - magnesium - 24-06-2025

(24-06-2025, 04:03 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2025, 02:10 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2025, 01:56 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.How did the scribe mantained the coherence of words structure using a 'copy-and-modify' mechanism along so many pages,... ?

I think the answer is already in the question: not at all.

That may be too, admittely.

I have a question for Torsten, if I'm allowed to ask. If I understood correctly the "A possible generating algorithm of the Voynich manuscript" paper, you wrote a software which implements your 'copy-and-modify' procedure, defined some parameters (ie.: which characters are 'similar' according to your rule #1), seeded it with some Voynichese sentence and had it write a pseudo-Voynich text which shows concordance with the real VMS in a number of important statistics (which is a good thing, of course!). May I ask you if you calculated the percentage of hapax legomena your method produced? And, is it possible to have a link to one of the pseudo-Voynich texts? I've always been curious to see what it actually looks like.

I'm not Torsten, but I can provide an answer. Timm and Schinner's Github supplementary materials for their 2019-2020 paper includes a sample generated text, more than 10,830 tokens long: You are not allowed to view links. Register or Login to view.

Of this generated text's 2,228 unique word types, 1,202 of them (~54%) are hapax legomena.

RE: This Famous Medieval Book May Be a Hoax - cvetkakocj@rogers.com - 24-06-2025

I suppose it is much easier to say the VM text is a nonsense than to bother leaning a very complicated language spoken in Northern Italy/Southern Austria, written for the first time in the VM. Because of the nature of the language, Timm's word generating technique can generate a lot of words that are found in the VM, but it cannot generate correct grammatical and inflectional forms the VM contains. All other Timm's objections can be explained:
1. Extreme repetition: When the root words are mainly comprised of one syllable, the repetition cannot be avoided. There are many other reasons for repetition or seeming repetition.

2. Low conditional entropy - proper transliteration and transcription will increase the conditional entropy, so will adjustments from phonetic to written standardized language and expanded vocabulary.

Binomial word-length distribution - When the most common pattern of root word is consonant-vowel or consonant-vowel-consonant, and the words are formed by changing or adding a letter, there would be a lot of repetition. Also, a limited number of prefixes to form perfective verbs increases the chance that the word initial is EVA qo or o. Repetition of suffixes could be expected in highly inflectional language.

Lack of clear word order or repeated phrases: Not all languages have fixed word order. Slovenian language has most flexible word order. The regular phrases like English I give, I know, I go are indicated with a common suffix -m and unwritten pronoun. The articles, lake the gift, a day, das Buch, ein Buch - are not used, because the underlying language is not using them.

Systematic shifts from Currier A to Currier B: There are no two languages, the seeming difference is caused by subject matter that favors one words over other, and by writing mood. In English, the Language A would have more I + verb phrases, in Voynichese, it has -m suffixes and dropped pronoun. In English, language B would have more 'you + verb' phrases in Voynichese, - dy suffixes and dropped pronoun.

Context-dependent self-similarity: The language that uses different suffixes for conjugation and declination, displays a lot of similar words that differ for one or two letters only. Also, in language where words are generated by adding a letter of two as a prefix or adding a letter to existing word, will show a lot of similar words.

Deep correlation between frequency, similarity, and spatial proximity: The oral language the peasants were using was way more simplified and centered on the root words. With the progress of literacy, the vocabulary was expanded.

A single network of similar word forms: When a single syllable root can have ten or more different inflectional forms, this looks like a network of similar word. And when another word differs for only one letter, can generate ten or more inflectional forms, this increases similarity.

Random walk-like statistical behavior and long-range correlations: In the isocolonic writing, like Balkan Slavic in the 15th century, the long/range correlation is normal.

Lines function as structural units: The isocolonic writing pertains to a rhetorical prose writing, consistent of a main sentence and several lines colons related to the same subject of similar or different length, as long as each line contains the same number of stresses.

Absence of semantic categories (e.g., nouns vs. verbs): Without knowing the alphabet and the language, nor correct grammar, it is impossible to know which word is noun, verb, pronoun, etc., particularly in the languages that does not use articles.

No identifiable function words: Many one- and two-letter Voynich words are function words, and many more function words are written together with the next word in so-called word blocks, which in Slavic languages were in use up to the 16th century, while in the rest of Europe were written separately by 13th century. Not all languages use personal pronouns since they are implied with the suffixes.

No evidence of word roots: In highly inflectional languages, particularly when written phonetically, many words overlap or cause sound changes, so that it is often difficult to determine root, unless you know what which part of the word is a prefix or suffix, and how the word evolved.

No corrections or deleted sequences: An experienced writer or scribe could write without corrections.

Line-endings fit precisely into available space: This is questionable in the longer lines. When the words are short, it is easy to fit the whole word at the end of the line and continue on the next line. In poetry, the lines are usually short enough

RE: This Famous Medieval Book May Be a Hoax - Jorge_Stolfi - 24-06-2025

(24-06-2025, 01:44 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.On systematic shifts between A and B: The differences between Currier A and B [...] are surely far too extensive to ascribe to only subject matter.

Why would you say so? Would the word frequencies of an English article on Greek Philosophy and a technical manual of analytic chemistry be less different than those of, say, Herbal and Recipes?

Quote:(or indeed Lisa's top three scribes)

The Scribes issue is too complicated to discuss here. Anyway it is about alleged differences in handwriting, not word statistics.

Quote:Word types are different at line start than elsewhere in the line. It is more than simply adding a new initial glyph onto a word type for whatever reason. Both middle glyphs and final glyphs can be different.

Word types are different at line end than elsewhere in the line. It is more than simply a final glyph cluster being abbreviated into final m. We often see different initials as well. Glyphs pop up in common line end words in ways that aren't common in mid-line words.

Song lyrics typically capitalize the first letter of each line and mostly omit any punctuation that should appear at the end of each line (for reasons that actually make musical sense). Line breaks are somewhat constrained by the song's rhytm, and there are reasons to keep lines short, but otherwise the "scribe" is free to select them to fit the available text width.

I am not saying that any part of the VMS is poetry or songs, or that some glyphs (other than one-leg gallows) are capitals or punctuation. Just showing an example of how a meaningful plaintext in natural language can have different character statistics at start, middle, and end of lines.

Quote:And I cannot think of any natural-language reason for how scribes avoid certain glyphs appearing under other ones.

What do you mean? Would this not be a consequence of other more "natural" fearures?

Quote:Word types are different at paragraph start, at the line end of top row, and importantly in the middle of top row.

Consider this hypothetical text:

Xeroxcopyta voynichensis, also called Xenophobia maxima: good for urticaria, emacs pinkie, coffe addiction. Take 3 oz daily as tea or poultice morning, lunch, and bedtime. Collect leaves in summer. Grows on mountains and hills.
Smilax ferox, which in the Isle of Manx is called Quux: good for urticaria, memory parity errors, software rot. Take 1 oz/day ground in a capsule at bedtime. Collect seeds in summer or fall. Grows on hills.
Narnia botox, apud Xenophon, not found here in Manx: good for urticaria, ingrown toenail, gluten addiction, tomatophobia. Drink 2 oz per day as tea at bedtime and breakfast until tongue turns green. Pick the leaves in the fall under full moon. Grows in valleys and hills.

Can you see how natural-language text can have different word "types" and even different glyph frequencies in different parts of each paragraph?

By the way: when doing this type of analysis, one should work with a single homogeneous section of the VMS. Different sections clearly have different paragraph semantic structures. Stuffing all together into any statistics algorithm will produce results that will be hard to interpret.

Quote:I cannot see how this is like a natural language. There are two overlapping questions that I take from it: 1. How can these patterns be compatible with meaning and 2. How were these patterns produced? [...] The meaningless hypothesis has a big advantage in that the first question is settled.

Quite the opposite. One big problem of the meaningless hypothesis is that the generating "algorithm" would have to be extremely complicated to produce all the variation we see in the VMS -- from section to section, over the span of paragraphs and lines, in labels and circular text, ... Whereas such variations are quite possible, and indeed expected, under the natural language hypothesis.

RE: This Famous Medieval Book May Be a Hoax - dashstofsk - 24-06-2025

(24-06-2025, 04:08 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.the first and last pages of a given quire would be encrypted closer in time together than the first and second pages of the quire.

Yes, there is evidence that suggests that the majority of the manuscript sections were written sheet-by-sheet and not in book page order. The authors could have first written both halves of a side then turned the sheet over and written both halves of the other side. Then when they had enough sheets they could have stacked them in any order, any orientation and stitched them together. This would imply that the text of the manuscript is not in any narrative order, which strengthens the artificial construction hypothesis.

Here is some extra evidence to support this further.

You are not allowed to view links. Register or Login to view.

Pages You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. are low in words ol . These pages are the left and right halves of one side of a sheet. Then later in the quire pages You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. are high in ol . These pages are also the halves of one side. So it looks possibly like You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. were written in one sitting and You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. together in another sitting. The author seemed to favour use of ol on one occasion and not on the other. The distribution of ol definitely is not regular in BioB2, which is a bit odd given that this is the most frequent word in BioB2 and could be expected to have a useful function in every piece of writing. ( By making 1000 simulations to randomly place the 290 words of ol within the pages of the quire I was able to use the techniques of hypothesis testing to show that the words are not regularly distributed. )

Something similar is also happening with words in BioB2 with the prefix qoke . You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. have few such words and are both on one side. The author chose not to use this prefix much on this occasion. Again also these words are not regularly distributed.

You are not allowed to view links. Register or Login to view.

It all seems to suggest that the author was choosing to vary the construction method at each sitting, wasn't following any particular standard. Hence no formal encryption method. No natural language with correct spelling.

RE: This Famous Medieval Book May Be a Hoax - RobGea - 24-06-2025

Nevermind

RE: This Famous Medieval Book May Be a Hoax - dashstofsk - 24-06-2025

(24-06-2025, 07:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.One big problem of the meaningless hypothesis is that the generating "algorithm" would have to be extremely complicated to produce all the variation we see in the VMS --

I think not so. A lot of it could be due to the author giving himself a free hand to write what he liked and not to stick rigidly to a formal method. I said something about this in an earlier post.

You are not allowed to view links. Register or Login to view.

The manuscript is probably full of 'happy accidents' that give it many of its nice anomalies.

RE: This Famous Medieval Book May Be a Hoax - tavie - 24-06-2025

[Edited/removed because I am just repeating stuff from other posts which won't help]

(24-06-2025, 07:36 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:And I cannot think of any natural-language reason for how scribes avoid certain glyphs appearing under other ones.
What do you mean? Would this not be a consequence of other more "natural" fearures?

How would it be? I'm writing an article on this so it would be interesting to hear of a natural reason. It seems very artificial to me.

RE: This Famous Medieval Book May Be a Hoax - Mauro - 25-06-2025

(24-06-2025, 04:28 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2025, 04:03 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2025, 02:10 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.[quote='Mauro' pid='68025' dateline='1750769800']

How did the scribe mantained the coherence of words structure using a 'copy-and-modify' mechanism along so many pages,... ?

I think the answer is already in the question: not at all.

That may be too, admittely.

I have thought a little about this and I think the objection stands (there are two objections, actually). I'll briefly give some examples in the following, writing a full analysis would be a very long task, I might endeavour in it but at the moment this is what I have. But first I want to say that if you asked me one year ago what the VMS is and how it was written I would have blissfully answered: "Probably meaningless, generated with some algorithm (ie. with dices and a rulebook, or directly in the mind of the scribe), probably along the line of Torsten's paper". It was a sad day when the objection(s) came to my mind, because I had to revert to a gloomy "I don't know at all, it could be anything". So if I am wrong, I will be happy!

I focus on rule #3 of Torsten's paper "A possible generating algorithm of the Voynich manuscript" (the easiest rule to analyze because it basically needs just one parameter to be implemented: the probability of choosing rule #3 among the possible rules):

Quote:(3) Combine two source words to create a new word. As an example, the two words <chol > and <daiin > combine to <choldaiin > or <cholaiin >.

The first objection is that I'd expect the frequency of a combined word A+B to be proportional to the frequency of A multiplied by the frequency of B. For instance 'daiin' and 'ol' are the most frequent words (Rf1a-n transcription, whole text) so I'd expect 'oldaiin' and 'daiinol' to be the most frequent composite words. But this does not happen: 'daiinol' has only one occurrence,'oldaiin' has 6. I can explain 'daiinol' easily to myself, by adding a parameter, that is to say a rule such as 'n almost always terminates a word, so rule #3 is only very rarely allowed to proceed when the word which is being 'copied-and-modified' ends with 'n'. But 'oldaiin' is harder to explain: why 6 occurrences, when 'chodaiin' gets 47, and 'cho' is much rarer than 'ol' (*)? It seems parameters (rules) are needed here too, and to accomodate all the cases like this, which are too improbable to be due to chance alone, one would need a lot of parameters. Or equivalently, with a human scribe creating words in his mind instead of a computer, the mental picture the scribe has of the structure of Voynich words must be quite complicated.

The second objection is that this behaviour seem to carry across the whole manuscript: notwithstanding all the statistical variations between the different sections of the VMS (which means many of the 'rules' actually do change with time, and is one of the things Torsten's SelfCitation explain rather neatly) 'chodaiin' is always much more represented vs. 'oldaiin' than what would be expected. I think it's improbable that a 'mental rule' like this can remain in place when the scribe is freely applying modifications to generate lots of meaningless words and text.

Herbal A: 'chodaiin' is 12.5x more frequent than 'oldaiin'
Herbal B: the only case where 'oldaiin' is more frequent (2x), but it's still very much underrepresented ('ol' is the 10th most frequent word, 'cho' does never appear)
Stars B: there are 6 'chodaiin', but zero 'oldaiin'
Astronomical: zero of each
Cosmological: one 'chodaiin', zero 'oldaiin'
Pharma A: equal number of 'chodaiin' and 'oldaiin' (5 each)
Stars A: one 'chodaiin', zero 'oldaiin'
Balneological B: one 'chodaiin', zero 'oldaiin'

(*) And also 'chod' and 'aiin' are rarer than 'ol' and 'daiin'. One could reach 'chodaiin' from 'chol' (which is frequent) + 'daiin', but this would not remove the problem because one needs anyway to add a rule to get rid of the 'l' (and with a certain probability too: 'choldaiin', with the 'l', occours 6 times).

RE: This Famous Medieval Book May Be a Hoax - Letieum - 25-06-2025

A little thought in passing:

While reading the various posts, I get the impression that some people sometimes oppose the statements “this is the product of a generative algorithm” and “this is a cypher”.

I think this is a mistake: if you have a functional method for generating meaningless pseudo-language text, you can easily turn it into a fairly effective cyphering method.

For example, considering a Torsten-like method, at some point you take a base word and you have to choose a modification from a list (delete, add, modify a glyph, etc.). This choice is then a good place to encode meaning:
* If I add a glyph, it encodes such a letter / syllable / word / …
* If I delete a glyph, it encodes this,
* Etc.
(This is just an illustration, very incomplete, of the idea.)

In short, showing elements that support the idea that “this is the product of a generative algorithm” is always interesting, but it tells us nothing about the meaningless vs. cipher question.