The Challenge of Analyzing a Dynamic Text

The Challenge of Analyzing a Dynamic Text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The Challenge of Analyzing a Dynamic Text (/thread-5376.html)

Pages: 1 2 3 4

The Challenge of Analyzing a Dynamic Text - Torsten - 17-02-2026

The Challenge of Analyzing a Dynamic Text: Why the Voynich Manuscript Resists Systematic Analysis
by Torsten Timm

Abstract:

Quote:The Voynich Manuscript (MS 408) presents a unique analytical challenge that transcends conventional cryptanalytic approaches. This paper examines why standard analytical frameworks—whether linguistic, cryptographic, or statistical—consistently fail to produce definitive results. It argues that the manuscript's most fundamental property is its continuous evolution throughout the text, creating a dynamic system where local predictability coexists with dramatic large-scale transformation. By examining the manuscript through the lens of network analysis and developmental processes, the paper demonstrates that the text exhibits properties consistent with organic growth rather than rule-governed production. This perspective reconciles apparently contradictory observations and provides a framework for understanding why the manuscript has resisted systematic analysis for over a century.

The challenge of Analyzing a Dynamic Test.pdf (Size: 136.1 KB / Downloads: 27)

RE: The Challenge of Analyzing a Dynamic Text - Dunsel - 18-02-2026

A Brain > Markov Chain

RE: The Challenge of Analyzing a Dynamic Text - Typpi - 18-02-2026

There's not much information in the paper about how the model recreates conditional dependencies (e.g., word-to-word transition statistics, positional entropy) do you have anything to elaborate on that?

Also, have you ever done a systematic correlation between particular glyph clusters and illustration types?
Because that would be hard to reconcile with pure pseudo-text.

The study thats linked about people defaulting to copying takes place in a modern context though and I don't know how much it would relate to a medieval scribe, but is interesting nonetheless.

RE: The Challenge of Analyzing a Dynamic Text - Torsten - 18-02-2026

(18-02-2026, 04:24 AM)Typpi Wrote: You are not allowed to view links. Register or Login to view.There's not much information in the paper about how the model recreates conditional dependencies (e.g., word-to-word transition statistics, positional entropy) do you have anything to elaborate on that?

Also, have you ever done a systematic correlation between particular glyph clusters and illustration types?
Because that would be hard to reconcile with pure pseudo-text.

The study thats linked about people defaulting to copying takes place in a modern context though and I don't know how much it would relate to a medieval scribe, but is interesting nonetheless.

Thank you for your excellent questions.

To question 1) Positional entropy patterns:
The positional entropy patterns emerge because of two reasons. One is the permanent copying process leading to a feedback loop. "The rules to modify a source word normally don't affect the order of the glyphs. This is one reason for the observation that the words in the VMS share the same rigid word structure" (Timm & Schinner 2020, p. 10). And "As for the actual selection process of source words, it is clear from the results of section 2 (as well as simply suggested by the scribe's convenience) that they are to be chosen at least from the same page. Since it is handy to copy a word from the same position some lines above, our implementation of the algorithm includes a mechanism that selects (with a given probability) even tokens from the previous line at the same writing position." You are not allowed to view links. Register or Login to view..
The second one is that "for a text created by self-citation it appears as a logical assumption that the scribe also used aesthetically motivated design rules for glyph selection, in order to harmonize the overall appearance of the text. These rules specify when two glyphs may follow one another" (Timm & Schinner 2020, p. 10). Schwerdtfeger (You are not allowed to view links. Register or Login to view.) described in 2008 the following four design rules for the VMS: (1) line-glyphs can follow line-glyphs or <a>; (2) curve-glyphs and <a> can follow curve-glyphs; (3) the <l>-glyph can be used as a curve-glyph or as a line-glyph; and (4) gallows glyphs count as curve glyphs." Timm & Schinner 2020, p 10.

To question 2) Correlation between text and illustrations:
The shift from Currier A to B suggests that the order of sections was: Herbal in Currier A, Pharma in Currier A, Astro, Cosmo, Herbal in Currier B, Stars in Currier B, Biological in Currier B (see section "Shift from <chol>/<chor> to <chedy>/<shedy>" in You are not allowed to view links. Register or Login to view.). This means with the exception of the pages sharing Herbal illustrations, the scribe wrote pages sharing the same illustration together.
In short, the two Herbal sections are devastating to the semantic correlation hypothesis. If vocabulary tracked illustration meaning, both Herbal sections should use similar words. Instead, they differ because they were written at different evolutionary stages. Topic modeling detects vocabulary differences between sections and misinterprets them as "topics" when they're actually just temporal snapshots of an evolving pseudo-text generation process.

To question 3) Modern vs. medieval context:
The Gaskell & Bowern experiment validates a universal cognitive constraint, not a culturally-specific behavior. If so, medieval scribes would be more likely to use copying strategies, since their primary professional activity was copying texts. The medieval context therefore strengthens rather than weakens the self-citation argument.

RE: The Challenge of Analyzing a Dynamic Text - Jorge_Stolfi - 18-02-2026

(18-02-2026, 08:07 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.In short, the two Herbal sections are devastating to the semantic correlation hypothesis. If vocabulary tracked illustration meaning, both Herbal sections should use similar words.

Unless the Author decided to revise his spelling system at some point. Or decided to write in Northwest Low Central Bavarian instead of North-Central High West Bavarian. Or he copied the text of each section from a different book, and some of those source books were in Elfdalian, some in Gutnish. Or ...

All the best, --stolfi

RE: The Challenge of Analyzing a Dynamic Text - Torsten - 18-02-2026

(18-02-2026, 09:37 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Unless the Author decided to revise his spelling system at some point. Or decided to write in Northwest Low Central Bavarian instead of North-Central High West Bavarian. Or he copied the text of each section from a different book, and some of those source books were in Elfdalian, some in Gutnish. Or ...

All the best, --stolfi

These alternatives obviously cannot explain the single unified network for the Voynich "connecting 6796 out of 8026 words (=84.67 %)" [Timm & Schinner 2020, p. 4]. The network graphs show all frequently used words are connected to each other in one network [see You are not allowed to view links. Register or Login to view.].

Even more critically is the frequency-connectivity correlation "High-frequency tokens also tend to have high numbers of similar words" [Timm & Schinner 2020, p. 6].

Both features are predicted by self-citation through a feedback loop: frequent words have a higher chance of being copied, creating more similar word variants. The existence of more similar words in the text increases the chance that more instances of that group of words are created. This self-reinforcing process explains why knowing the most frequent word in a particular section allows you to predict which bigrams are most frequently used there.

RE: The Challenge of Analyzing a Dynamic Text - Rafal - 18-02-2026

Torsten, I generally like your "autocitation hypothesis".

But does this paper bring anything new?

I wouldn't also agree with "Single authorship". The handwriting on different pages feels disctinct to me. Some scribes generate more text than other, some have their style like using "ed" a lot (see: You are not allowed to view links. Register or Login to view. ). These styles were labelled once as Currier A and Currier B.

Do you mean single authorship of:
- written text: it was all written in whole VM by one man
- designed text: it was written by one man in some draft and then could be copied by several scribes to VM
- the system: one man designed "the word generator" and then several people used it
?

RE: The Challenge of Analyzing a Dynamic Text - Jorge_Stolfi - 18-02-2026

(18-02-2026, 01:12 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.These alternatives obviously cannot explain the single unified network for the Voynich "connecting 6796 out of 8026 words (=84.67 %)" [Timm & Schinner 2020, p. 4]. The network graphs show all frequently used words are connected to each other in one network

Have you perchance tried your statistical analysis on any phonetic rendering of any East Asian monosyllabic language? Like Chinese in pinyin, Vietnamese, Tibetan, Thai...

All the best, --stolfi

RE: The Challenge of Analyzing a Dynamic Text - Torsten - 18-02-2026

(18-02-2026, 05:05 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Have you perchance tried your statistical analysis on any phonetic rendering of any East Asian monosyllabic language? Like Chinese in pinyin, Vietnamese, Tibetan, Thai...

All the best, --stolfi

Yes, I have. See You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.

RE: The Challenge of Analyzing a Dynamic Text - kckluge - 18-02-2026

(18-02-2026, 08:07 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.To question 2) Correlation between text and illustrations:
The shift from Currier A to B suggests that the order of sections was: Herbal in Currier A, Pharma in Currier A, Astro, Cosmo, Herbal in Currier B, Stars in Currier B, Biological in Currier B (see section "Shift from <chol>/<chor> to <chedy>/<shedy>" in You are not allowed to view links. Register or Login to view.). This means with the exception of the pages sharing Herbal illustrations, the scribe wrote pages sharing the same illustration together.

IIRC, you interpret the defining characteristics of Lisa Fagin Davis' scribal hands as reflecting changes in the writing of a supposed single creator over time. Herbal B is a thorn in the side of that theory given that the hands she identifies with Scribe 2 (who writes the Balneology section) and Scribe 3 (who writes the Starred paragraphs section) both occur in the Herbal B folios.

Also, any posited gradual transition from the A language dialect pages to the B language dialect pages requires interpreting the label text of the Zodiac folios -- and only the label text -- as the stepping stones between the two (if you didn't follow the thread, see the discussion in You are not allowed to view links. Register or Login to view.). The non-label text on the Zodiac pages (with one exception) falls firmly on the A language side of what I've come to think of as "Currier gulch" in the bigram distribution space.

Quote:In short, the two Herbal sections are devastating to the semantic correlation hypothesis. If vocabulary tracked illustration meaning, both Herbal sections should use similar words. Instead, they differ because they were written at different evolutionary stages. Topic modeling detects vocabulary differences between sections and misinterprets them as "topics" when they're actually just temporal snapshots of an evolving pseudo-text generation process.

I agree fully that the Herbal B dialect is a massive problem for efforts to attribute substantial differences in dialect vocabularies to nothing more than topical differences in the sections. To the extent that I remain agnostic at best about spaces being word separators I share your skepticism about what applying topic modeling is actually measuring in the text.

It's unfortunate there is no preprint of the Yale work Lisa described in a recent talk to refer to for more detail, but if I understand correctly what they are seeing in their Latent Semantic Analysis results is the same type of transition page-to-page in the Starred paragraph section that is found in texts where there is a narrative/argumentative arc (or that you expect it to show from the self-citation method), but they are not seeing that behavior in the herbal section pages (or in the similar genre texts they used as comparisons). Which is not surprising because there is no sustained narrative flow in something like an herbal or bestiary, it's just discrete descriptions of X, Y, & Z one after another. If that's the case, then that requires explanation in the context of the self-citation method -- why should the LSA results differ between the sections?

I remain unconvinced that the various dialects relect a smooth transition of a single process over time as opposed to discrete distributions with some degree of overlap in their tails, but I'll take a closer look at the material in the You are not allowed to view links. Register or Login to view. link you provided.