The Voynich Ninja
The Challenge of Analyzing a Dynamic Text - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The Challenge of Analyzing a Dynamic Text (/thread-5376.html)

Pages: 1 2 3 4


RE: The Challenge of Analyzing a Dynamic Text - Rafal - 18-02-2026

Quote:IIRC, you interpret the defining characteristics of Lisa Fagin Davis' scribal hands as reflecting changes in the writing of a supposed single creator over time.

Torsten, if I may ask, do you need this assumption for something? Does it explain something to you? Or is it just a loose observation - you feel so but it's not really that important as nothing depends on it.

Personally I am close to gibberish hypothesis but at the same time accept multiple scribes.

I have never got deep into Currier A vs Currier B thing but I believe others who say that there are Currier A pages, Currier B pages and in-between pages.

I can see a simple explanation for that.
There is some system used by scribes to create gibberish but leaving some freedom to them. They have their personal style. Scribe 1 writes in Currier A, Scribe 2 writes in Currier B.

Then one day a conversation happens:

Scribe 1: Man, I hate my job. So tired of inventing these crappy words that don't mean anything. Cheating people. I would really rather be a peasant ploughing a field or a beggar sitting at the church entrance.

Scribe 2: Hey, a bad day may happen to everyone. You want to look at my page? You could borrow some words from it. Get some inspiration

Scribe 1: Thanks buddy!  Smile

This way Scribe 1 could include some words typical to Scribe 2 but never fully changed his style. It resulted in intermediate pages.


RE: The Challenge of Analyzing a Dynamic Text - JoJo_Jost - 18-02-2026

Indeed, if the text were written phonetically (not unusual for the 15th century) and two different scribes (e.g. Bairish from two different villages) spoke slightly different dialects, certain differences would emerge. I cannot say whether this would be reflected in this results, but it would at least be conceivable.


RE: The Challenge of Analyzing a Dynamic Text - Torsten - 18-02-2026

(18-02-2026, 07:53 PM)kckluge Wrote: You are not allowed to view links. Register or Login to view.IIRC, you interpret the defining characteristics of Lisa Fagin Davis' scribal hands as reflecting changes in the writing of a supposed single creator over time. 

I don't do that. See my paper discussing Lisa Fagin Davis scribal hands: You are not allowed to view links. Register or Login to view..

(18-02-2026, 07:53 PM)kckluge Wrote: You are not allowed to view links. Register or Login to view.I agree fully that the Herbal B dialect is a massive problem for efforts to attribute substantial differences in dialect vocabularies to nothing more than topical differences in the sections. To the extent that I remain agnostic at best about spaces being word separators I share your skepticism about what applying topic modeling is actually measuring in the text. 

It's unfortunate there is no preprint of the Yale work Lisa described in a recent talk to refer to for more detail, but if I understand correctly what they are seeing in their Latent Semantic Analysis results is the same type of transition page-to-page in the Starred paragraph section that is found in texts where there is a narrative/argumentative arc (or that you expect it to show from the self-citation method), but they are not seeing that behavior in the herbal section pages (or in the similar genre texts they used as comparisons). Which is not surprising because there is no sustained narrative flow in something like an herbal or bestiary, it's just discrete descriptions of X, Y, & Z one after another. If that's the case, then that requires explanation in the context of the self-citation method -- why should the LSA results differ between the sections?

I remain unconvinced that the various dialects relect a smooth transition of a single process over time as opposed to discrete distributions with some degree of overlap in their tails, but I'll take a closer look at the material in the You are not allowed to view links. Register or Login to view. link you provided.

For my analysis see You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view..

Or see You are not allowed to view links. Register or Login to view.:
Quote:Interpreting normal texts as bit sequences yields deviations of little significance from a true (uncorrelated)
random walk. For the VMS, this only holds on a small scale of approximately the average line length; beyond positive correlation build up: the presence/absence of a symbol appears to increase/decrease the tendency towards another occurrence

See also my You are not allowed to view links. Register or Login to view. from 2024 where I discuss the distribution of EVA-m:
Quote:This means that changes in bigram frequencies are quite normal for the Voynich text. Therefore other criteria would allow to partition the Voynich text further. For instance EVA-m could be used to distinguish between Currier-A folios with EVA-m (bifolio 3/6, bifolio 17/24 ...) and Currier-A folios without EVA-m (bifolio 2/7, bifolio 10/15, bifolio 20/21, f25r/f25v, ...)  Or to distinguish between folios with EVA-m in line final position (Currier B) and folios without a preference for a certain line position (Currier-A).



RE: The Challenge of Analyzing a Dynamic Text - Torsten - 19-02-2026

(18-02-2026, 07:53 PM)kckluge Wrote: You are not allowed to view links. Register or Login to view.Also, any posited gradual transition from the A language dialect pages to the B language dialect pages requires interpreting the label text of the Zodiac folios -- and only the label text -- as the stepping stones between the two (if you didn't follow the thread, see the discussion in You are not allowed to view links. Register or Login to view.). The non-label text on the Zodiac pages (with one exception) falls firmly on the A language side of what I've come to think of as "Currier gulch" in the bigram distribution space.

The observation that Zodiac label text exhibits more B-language characteristics while contemporaneous running text remains predominantly A-language is consistent with the self-citation hypothesis rather than contradicting it. Moreover, detailed analysis of label generation patterns provides empirical support for the copying mechanism.

Context-Dependent Copying Modes

The self-citation process operates differently depending on textual context. Label text is characterized by spatial constraints, isolation from continuous text, and reduced contextual dependencies, favoring selection of recently developed variants. Running text operates under continuous flow conditions where copying from nearby established paragraphs favors conservative template usage. During transitional evolutionary stages, the scribe had access to both A-vocabulary and emerging B-vocabulary. Writing context influenced which subset was preferentially selected, producing the observed distribution where labels exhibit more B-characteristics while running text remains more A-like.

Empirical Evidence from Label Patterns

Analysis reveals transparent copying patterns. On folio f70v2, consecutive labels demonstrate systematic character substitution:

otaral → otalar → otalam → dolaram → okaram → okaldal

Each label represents a minor modification of its predecessor through glyph substitution (You are not allowed to view links. Register or Login to view., p. 7), indicating iterative copying rather than independent vocabulary selection. 
Individual labels even exhibit internal character repetition inconsistent with natural language morphology: oteotey, oteoteotsho (You are not allowed to view links. Register or Login to view.); okeoky, okeokeokeody (You are not allowed to view links. Register or Login to view.). These patterns suggest mechanical character sequence copying rather than morphological construction.

Cross-Section Label Duplication

Labels appear identically in both astronomical and herbal sections: okary, oky, otalam, okeoly, otaly, otoky, otaldy, otal, ykeody, okeody, okeos, otory, okody, oran (Timm, 2014, p. 9). 
Under meaningful text: This presents a semantic anomaly requiring astronomical objects to share nomenclature with botanical entities.
Under self-citation: This duplication is expected, as labels are generated by copying from recently written text regardless of semantic appropriateness. The observation that "these labels depend on each other" reflects copying dependencies, not semantic connections.

Cognitive Load and Copying Transparency

The heightened visibility of copying patterns likely reflects cognitive demands of circular text arrangement (Timm, 2014, p. 9). This cognitive load resulted in more mechanical copying with less modification, explaining why labels exhibit transparent sequential copying, increased internal repetition, and cross-section duplication.

Conclusion

The Zodiac label evidence supports continuous evolution. Labels exhibit more B-characteristics not because they represent a different linguistic system but because writing context favored recently developed variants. The transparent copying patterns, cross-section duplication, and internal character repetition provide direct empirical evidence for self-citation operating under spatial and cognitive constraints.


RE: The Challenge of Analyzing a Dynamic Text - kckluge - 20-02-2026

Torsten, thank you for your thoughtful responses. Whether or not I agree with you, I want to make sure I correctly understand your position and evidence.


RE: The Challenge of Analyzing a Dynamic Text - Torsten - 21-02-2026

I have updated the draft of my paper 
The Challenge of Analyzing a Dynamic Text: Why the Voynich Manuscript Resists Systematic Analysis

.pdf   The_Challenge_of_Analyzing_a_Dynamic_Text.pdf (Size: 182.67 KB / Downloads: 14)

The revised version of this paper is not merely an editorial update but a substantially restructured argument. While the original version presented the network analysis, continuous evolution, and self-citation hypothesis as parallel observations, the revised version organizes them around a single thesis: that all major approaches to the Voynich Manuscript — cipher, natural language, and "two languages" — share an unexamined assumption that the text was produced by a static system, and that this assumption is the reason they fail.

The revised version adds three elements not present in the original. First, it explicitly addresses and refutes the most common alternative explanations for the vocabulary changes — spelling revision, dialect switching, and copying from multiple source texts — showing that all three predict discrete boundaries while the manuscript exhibits continuous gradients with intermediate forms. Second, it incorporates the contrast with natural language networks, demonstrating that even languages superficially resembling Voynichese (such as Vietnamese) produce fundamentally different network structures. Third, it engages with topic modeling results, noting that Latent Dirichlet Allocation allocates the entire manuscript to a single topic — a result difficult to reconcile with claims of semantic text-illustration correlation but expected for a text without semantic content.

The revised version also corrects several inaccuracies in the first draft, including the characterization of Voynichese entropy levels and the vocabulary comparison between the two Herbal sections, and provides more precise sourcing throughout.


RE: The Challenge of Analyzing a Dynamic Text - rikforto - 22-02-2026

I tend to think we spend too much time "psychologizing" the Voynich composer, but for once I would like to hear your thoughts on that. Do you have a sense of why someone would make a text this way? I get it is ultimately unfalsifiable and I don't think your paper---which I think is one of the better attempts I've seen to grapple with the word families---lives or dies on it, but I just cannot get my head around why someone would create this artifact with autocitation and drawings.


RE: The Challenge of Analyzing a Dynamic Text - JoJo_Jost - 22-02-2026

One point on this:

The observations on the development of vocabulary in the VMS are real and valuable, but I consider the conclusions drawn from them to be questionable. Having recently studied medieval German manuscripts intensively – Ortloff von Baierland, the Breslau Pharmacopoeia, Admonter Bartholomäus – I have also noticed such deviations within individual texts by individual authors. Spelling changes over the course of a manuscript. The frequency of words shifts dramatically depending on what is being described. The similarity between early and late sections decreases significantly, even in a simple medical text. All this does not require complex systems or even a self-citation process – it is simply what happens when a person writes a long text over weeks or months, possibly copying from different sources, changing topics or gradually changing their writing habits. The assumption that a “static system” should provide identical statistics on every page sets a bar that no real medieval manuscript could reach.


RE: The Challenge of Analyzing a Dynamic Text - Torsten - 22-02-2026

(22-02-2026, 01:56 PM)rikforto Wrote: You are not allowed to view links. Register or Login to view.I tend to think we spend too much time "psychologizing" the Voynich composer, but for once I would like to hear your thoughts on that. Do you have a sense of why someone would make a text this way? I get it is ultimately unfalsifiable and I don't think your paper---which I think is one of the better attempts I've seen to grapple with the word families---lives or dies on it, but I just cannot get my head around why someone would create this artifact with autocitation and drawings.

You're right that the question of motivation is ultimately speculative, and I appreciate that you recognize the paper doesn't depend on answering it. Still, I think the question "why would someone create this?" becomes less puzzling once we separate two things that often get conflated: the motivation for creating the manuscript as an object, and the method by which the text was generated.

On the method, there is no mystery at all. The Gaskell and Bowern (2022) experiment shows that self-citation is not a strategy someone consciously adopts — it is what inevitably happens when a person produces extended pseudo-text. Participants asked to generate just 100 words of meaningless text spontaneously began copying and modifying their own output, simply because inventing genuinely new forms is cognitively exhausting (Bowern & Lindemann, 2021). The Voynich Manuscript contains roughly 38,000 tokens. At that scale, self-citation isn't a choice — it's a cognitive default. Whatever the scribe's original intention, the text generation process would have converged on copying and modifying previously written words simply because of the volume required. D'Imperio (1978) already observed that someone with the intention of producing dummy text "would naturally tend to repeat parts of neighboring strings with various small changes." The scribe need not have known they were doing it.

On the motivation for the object itself, that question will most likely remain unanswerable unless additional historical documents surface. As Timm and Schinner (2020) note, "probably, the author was undergoing the substantial effort of creating the VMS in order to gain something. Not necessarily money for selling the book, although (because the algorithm can be executed by an experienced scribe almost as fast as writing down the text) the effort even for a 'classical fraud' now appears more reasonable. Perhaps it was about gaining reputation by possessing a mysterious book that no one would ever be able to decipher (simply because there really is nothing to decipher in it)" (p. 16).

Personally, I find the idea of a forgery intended for sale less convincing than the alternative: that the manuscript served to impress. Consider a physician who could claim that his superior knowledge derived from a mysterious book — one filled with illustrations of plants, astronomical diagrams, and human bodies, written in a learned-looking script that no one could read. In a fifteenth-century context, the possession of such a book would have conveyed authority and esoteric expertise. No one could verify what it "said," and that was precisely the point. The sheer craftsmanship — the vellum, the illustrations, the consistent calligraphic hand — would have reinforced the impression of a genuine scholarly work, while the impenetrable script would have discouraged anyone from questioning its content.

The illustrations, on this reading, don't require a separate explanation. They are part of making the object convincing as a medical or philosophical reference work. And the text doesn't need to encode information — it needs only to look as though it does.


RE: The Challenge of Analyzing a Dynamic Text - Bernd - 22-02-2026

I cannot comment on the text, but I believe your conclusion in reflected in the imagery as well. There is a gradual change rather than discrete styles exclusively associated with scribal hands. This is obvious in faces, but also in plants. There is a relatively large gap between Herbal A and B plants which suggests they were not created consecutively, but that Herbal B came significantly later. I still wonder why the artist felt the need to draw even more plants in a different style later. Maybe he just wanted to use the new more abstract style he had developed.

See my thread for corellation in faces between topics
You are not allowed to view links. Register or Login to view.

However, it's not that easy. Scribal hands do play a role in imagery, yet I believe they are rather the result of evolutionary stages than distinct persons.

One question - sorry if I missed this but where would you put the zodiac pages in your reconstructed order?