The Challenge of Analyzing a Dynamic Text

The Challenge of Analyzing a Dynamic Text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The Challenge of Analyzing a Dynamic Text (/thread-5376.html)

Pages: 1 2 3 4

RE: The Challenge of Analyzing a Dynamic Text - Torsten - 22-02-2026

(22-02-2026, 02:15 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.One point on this:

The observations on the development of vocabulary in the VMS are real and valuable, but I consider the conclusions drawn from them to be questionable. Having recently studied medieval German manuscripts intensively – Ortloff von Baierland, the Breslau Pharmacopoeia, Admonter Bartholomäus – I have also noticed such deviations within individual texts by individual authors. Spelling changes over the course of a manuscript. The frequency of words shifts dramatically depending on what is being described. The similarity between early and late sections decreases significantly, even in a simple medical text. All this does not require complex systems or even a self-citation process – it is simply what happens when a person writes a long text over weeks or months, possibly copying from different sources, changing topics or gradually changing their writing habits. The assumption that a “static system” should provide identical statistics on every page sets a bar that no real medieval manuscript could reach.

Thank you for this thoughtful critique grounded in concrete manuscript experience. You are right that spelling variation, vocabulary shifts, and evolving scribal habits are normal features of extended medieval text production. The paper does not dispute this.

However, the Voynich Manuscript does not merely show variation — it shows a specific combination of structural properties that medieval manuscripts do not exhibit. In the Voynich text, 84.67% of all word types form a single connected network through single-glyph edits. High-frequency words systematically have more graphically similar neighbors, while words appearing only once are isolated in the network. On more than half of all pages, two of the three most frequent words differ by only a single glyph. Vocabulary evolves in strict directional chains with intermediate forms (<chol> -> <cheol> -> <cheo> -> <chey> -> <chedy>), and words from early sections persist in late sections but never the reverse.

None of these properties characterize natural language texts, however much their spelling varies. In Ortloff von Baierland the word "und" may be extremely frequent, but it does not generate dozens of single-edit variants clustering on the same pages. Function words are distributed throughout the text regardless of topic — unlike Voynich, where the most frequent tokens can change from page to page. The paper does not claim that a real manuscript should produce identical statistics on every page; it argues that this particular constellation of properties observed in the Voynich text requires a different explanation than normal scribal variation.

RE: The Challenge of Analyzing a Dynamic Text - Torsten - 22-02-2026

(22-02-2026, 03:13 PM)Bernd Wrote: You are not allowed to view links. Register or Login to view.One question - sorry if I missed this but where would you put the zodiac pages in your reconstructed order?

This is the order I suggest (see You are not allowed to view links. Register or Login to view.):

Code:
Herbal in Currier A (Quire 1-8, 17)

Pharma in Currier A (Quire 15 + 19)

Astro (Quire 9)

Cosmo/Zodiac (Quire 10, 11, 12, 14)

Herbal in Currier B (Quire 4-8, 17)

Stars in Currier B (Quire 20)

Biological/Balnological in Currier B (Quire 13)

This is the You are not allowed to view links. Register or Login to view.:

Code:
Quire 1 Herbal A

Quire 2 Herbal A

Quire 3 Herbal A

Quire 4 Herbal A, Herbal B

Quire 5 Herbal A, Herbal B

Quire 6 Herbal A, Herbal B

Quire 7 Herbal A, Herbal B

Quire 8 Herbal A, Herbal B

Quire 9 Astro

Quire 10 Cosmo

Quire 11 Cosmo

Quire 12 Cosmo

Quire 13 Bio in Currier B

Quire 14 Cosmo

Quire 15 Pharma in Currier A

Quire 16 Missing

Quire 17 Herbal A, Herbal B

Quire 18 Missing

Quire 19 Pharma in Currier A

Quire 20 Stars

RE: The Challenge of Analyzing a Dynamic Text - JoJo_Jost - 22-02-2026

(22-02-2026, 03:49 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Vocabulary evolves in strict directional chains with intermediate forms (<chol> -> <cheol> -> <cheo> -> <chey> -> <chedy>), and words from early sections persist in late sections but never the reverse. [...]
In Ortloff von Baierland the word "und" may be extremely frequent, but it does not generate dozens of single-edit variants clustering on the same pages.

"Thanks for your reply. Actually, I wasn't concerned with your entire thesis, just that one specific point. Wink

But just for the record, words like this exist in Bavarian texts too. In the Ortloff von Baierland:

‘vnd’ (and) (1,942×) has 20 neighbours at edit distance 1 (vn, vnnd, vns, vno, und, vnde...) and even in one sentence:

Example: Ist aber ter harnn weiß vnd dunn vnnd das kleyn kornlein als d' sand an te potem seind (written with u: und dunn unnd)

sol (443×) → solt (216×) → soltu (43×) (shall). These are not ‘stages of evolution’ — they are simply inflected forms of the same verb: “soll”, ‘sollte/sollst’, ‘sollst du’.

‘kalte’ (cold) has 7 neighbours: kelte, kalt, kalter, kaltes, kalten, kaltem, kalts — pure inflectional forms.

However, spatial clustering is one of the points where you are actually describing a phenomenon that is less pronounced in normal texts. Whether this excludes a language depends on the encoding – an absorption encoding that merges articles and prepositions with the following word could, of course, create clusters of similar-looking words in the same line, simply because the same grammatical context produces the same prefixes. But that would still have to be proven, of course Wink

RE: The Challenge of Analyzing a Dynamic Text - dexdex - 23-02-2026

(22-02-2026, 02:34 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(22-02-2026, 01:56 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.I tend to think we spend too much time "psychologizing" the Voynich composer, but for once I would like to hear your thoughts on that. Do you have a sense of why someone would make a text this way? I get it is ultimately unfalsifiable and I don't think your paper---which I think is one of the better attempts I've seen to grapple with the word families---lives or dies on it, but I just cannot get my head around why someone would create this artifact with autocitation and drawings.

You're right that the question of motivation is ultimately speculative, and I appreciate that you recognize the paper doesn't depend on answering it. Still, I think the question "why would someone create this?" becomes less puzzling once we separate two things that often get conflated: the motivation for creating the manuscript as an object, and the method by which the text was generated.

On the method, there is no mystery at all. The Gaskell and Bowern (2022) experiment shows that self-citation is not a strategy someone consciously adopts — it is what inevitably happens when a person produces extended pseudo-text. Participants asked to generate just 100 words of meaningless text spontaneously began copying and modifying their own output, simply because inventing genuinely new forms is cognitively exhausting (Bowern & Lindemann, 2021). The Voynich Manuscript contains roughly 38,000 tokens. At that scale, self-citation isn't a choice — it's a cognitive default. Whatever the scribe's original intention, the text generation process would have converged on copying and modifying previously written words simply because of the volume required. D'Imperio (1978) already observed that someone with the intention of producing dummy text "would naturally tend to repeat parts of neighboring strings with various small changes." The scribe need not have known they were doing it.

On the motivation for the object itself, that question will most likely remain unanswerable unless additional historical documents surface. As Timm and Schinner (2020) note, "probably, the author was undergoing the substantial effort of creating the VMS in order to gain something. Not necessarily money for selling the book, although (because the algorithm can be executed by an experienced scribe almost as fast as writing down the text) the effort even for a 'classical fraud' now appears more reasonable. Perhaps it was about gaining reputation by possessing a mysterious book that no one would ever be able to decipher (simply because there really is nothing to decipher in it)" (p. 16).

Personally, I find the idea of a forgery intended for sale less convincing than the alternative: that the manuscript served to impress. Consider a physician who could claim that his superior knowledge derived from a mysterious book — one filled with illustrations of plants, astronomical diagrams, and human bodies, written in a learned-looking script that no one could read. In a fifteenth-century context, the possession of such a book would have conveyed authority and esoteric expertise. No one could verify what it "said," and that was precisely the point. The sheer craftsmanship — the vellum, the illustrations, the consistent calligraphic hand — would have reinforced the impression of a genuine scholarly work, while the impenetrable script would have discouraged anyone from questioning its content.

The illustrations, on this reading, don't require a separate explanation. They are part of making the object convincing as a medical or philosophical reference work. And the text doesn't need to encode information — it needs only to look as though it does.

I wrote on this hypothesis here: You are not allowed to view links. Register or Login to view.

I think it also neatly explains why someone would make a tome of such magnitude filled with gibberish: they started with just a few folios that were herbal in nature, as the quack herbals of the day tended to appear. then it grew, as the scam became more and more successful. At some point, other sections were added to make it more mystical, but no real content was added.

The order you suggest certainly squares away with this hypothesis: start like a dime-a-dozen herbal with herbal and pharma stuff. Then someone mentions the Zodiac and astronomy. But whoops! now your herbal has too little herbalism, so you add more herbal stuff. And maybe someone suggests reproductive health and baths so you draw a bunch of nymphs.

Whatever the exact motivations, I think you're bang on with how it was created and why. I do wonder about scribal hand differences, though...

RE: The Challenge of Analyzing a Dynamic Text - Jorge_Stolfi - 23-02-2026

(22-02-2026, 03:49 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In the Voynich text, 84.67% of all word types form a single connected network through single-glyph edits.

Guess which other natural language has this property...

All the best, --stolfi

RE: The Challenge of Analyzing a Dynamic Text - JoJo_Jost - 23-02-2026

@ Stolfi: The Ortloff von baierland has 74.4% connected... without chiffre... do u mean that Wink

RE: The Challenge of Analyzing a Dynamic Text - rikforto - 23-02-2026

Hold on, everyone. Is the claim of network connectivity about the *language* or the *representation*?

This is something of a leading question since I am fairly sure the answer is the latter, and therefore no matter how good the statistics fit, we can rule out any Romanized representation fairly easily---unless someone here wants to go to bat for simple substitution with the Latin Alphabet. Big Grin

We know it's neither Romanized Bavarian nor (I assume) Romanized Chinese. Bear in mind that Mandarin, conventionally written and analyzed, has an edit distance of exactly 1 between every word, but a different finding in Pinyin, and a different one still in Bopomofo and so on.

This observation bears on the conclusion in the paper at hand, though its by no means Torsten's only line of evidence. The fact that both a Germanic text and a Sinitic one have similar lexical connectivity in certain representations would imply that at high connectivity does not preclude that Voynich is a comparably distributed representation of any language in the Voynich script.

(A subtlety here: It may not be the case that all languages can be so represented by the Voynich script, I can't just rule out any of them out of hand.)

RE: The Challenge of Analyzing a Dynamic Text - nablator - 23-02-2026

(23-02-2026, 10:37 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.but a different finding in Pinyin

If "words" are the pinyin romanizations of Chinese, 100% of all word types form a single connected network through single-letter edits.

You are not allowed to view links. Register or Login to view.

RE: The Challenge of Analyzing a Dynamic Text - rikforto - 23-02-2026

(23-02-2026, 11:20 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(23-02-2026, 10:37 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.but a different finding in Pinyin

If "words" are the pinyin romanizations of Chinese, 100% of all word types form a single connected network through single-letter edits.

You are not allowed to view links. Register or Login to view.

This highlights another distinction; Chinese words in hanzi have an edit distance one independent of text. I might be wrong here, and frankly it's the abstraction from this point that I care about, but I think it should be possible to construct a text of some length that has word with minimum edit distance 2 apart. For initials, you'd need a lacuna in the alveolars and the finals are prolific enough it should take a hot second for the text to converge on edit distance 1. You are absolutely correct for the lexicon. Bopomofo might converge must faster, on reconsideration.

Either way, bigger point: This is dependent on the language, the representation, and the sample in ways that mean you can't simply infer the language or lack thereof.

RE: The Challenge of Analyzing a Dynamic Text - nablator - 24-02-2026

(22-02-2026, 02:15 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.Having recently studied medieval German manuscripts intensively – Ortloff von Baierland, the Breslau Pharmacopoeia, Admonter Bartholomäus – I have also noticed such deviations within individual texts by individual authors. Spelling changes over the course of a manuscript. The frequency of words shifts dramatically depending on what is being described. The similarity between early and late sections decreases significantly, even in a simple medical text.

Can you please share a text that shows these interesting statistical shifts?

(23-02-2026, 09:51 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.The Ortloff von baierland has 74.4% connected...

If you post the text I'll check if it's a joke or not. Smile