Why and how the text could be Bavarian

Why and how the text could be Bavarian - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: Why and how the text could be Bavarian (/thread-5312.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

RE: Why and how the text could be Bavarian - JoJo_Jost - 11-05-2026

(11-05-2026, 01:41 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I really cannot see much hope from the evidence you provided for the manuscript being Bavarian, or any other Germanic language or dialect.

Given Germany’s fragmentation into small kingdoms and duchies, and the many small rural dialects, German/Bairish is certainly a plausible possibility. Especially since the marginalia are in Bairish, it is at least a plausible theory.

But it is not an artificial construct either; a system capable of reconstructing these extreme and diverse structures would be a terrible anachronism.

RE: Why and how the text could be Bavarian - nablator - 11-05-2026

(11-05-2026, 12:47 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.You see this time and again in manuscripts.

Several spellings could coexist without clusters on pages or sections.

RE: Why and how the text could be Bavarian - JoJo_Jost - 11-05-2026

sry nablator i dont understand ur answer... Is that a question or a confirmation?

RE: Why and how the text could be Bavarian - nablator - 11-05-2026

(11-05-2026, 03:13 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.sry nablator i dont understand ur answer... Is that a question or a confirmation?

Several spellings, no modern, fixed orthography, but no clusters of frequent pattern on pages/sections either, no drift. The lack of standard orthography doesn't explain them.

RE: Why and how the text could be Bavarian - JoJo_Jost - 12-05-2026

@ nablator

It's probably not that simple; it depends on the underlying cipher structure + the different text structures

RE: Why and how the text could be Bavarian - JoJo_Jost - 14-05-2026

I’d like to revisit the topic of spaces here - Feaster had already investigated this, but I’ve adapted and refined it a bit for my own purposes. I have 8 rules, and I’m referring to all pages, including labels, etc.

I also examined how including line ends changes the results.

Filename: space.png Size: 119.89 KB 14-05-2026, 06:48 AM

Filename: spaces2.png Size: 32.67 KB 14-05-2026, 06:48 AM

As a reminder: I perform these calculations using Claude Opus 7 and use ChatGPT 5.5 with a slightly modified setup for verification. I try to be extremely thorough, but just as humans make mistakes, I cannot guarantee that there are no errors.

RE: Why and how the text could be Bavarian - JoJo_Jost - 14-05-2026

Forbidden Spaces in the Voynich Manuscript

To gain a slightly better understanding of which bigrams, trigrams, etc., belong together, I asked myself which glyphs are never separated by spaces. No wonder - this pattern is just as regular as the one involving spaces, and of course it stems in part from it. Nevertheless, I found the analysis interesting - sorry for the very, very long image.

I analyzed the spacing behavior in the VMS at the level of atom pairs (adjacent atoms within lines). The corpus contains 168,004 such atom pairs, based on the EVA-Z3b.

All glyph/atom pairs (a, b) within lines were counted
Inclusion threshold: n >= 50 occurrences and P(space) < 5%
The qualifying pairs were grouped into structurally coherent families

Result:

118,165 out of 168,004 glyph neighborhoods are strongly bound (70.3% of the corpus)
These are distributed across only 18 families covering various structural roles:

Token-end building blocks (aiin, edy, ee, vowel+m)
ch-initial, sh-initial, simple gallow+filler, bench gallow+filler
Token-initial blocks (qo, o-initial, d-initial, p/f-initial)
Connecting elements (vowel+r, vowel+l, e-connector, h-connector)

What this means
Here, everyone can let their imagination run wild. What is clear, however, is that the structure is very rigid.

The really interesting question remains: how could one achieve such consistency?

You are not allowed to view links. Register or Login to view.

Note: Many of the building blocks mentioned here (aiin, qo, edy, ee, che/she, Gallow clusters) have, of course, already been documented in earlier works—such as in Stolfi’s *Grammar of Voynichese* (2000), Feaster’s analyses, and various bigram studies. What this analysis adds is an explicit counterpoint to Feaster’s spacing rules: a systematic catalog of atom pairs where spaces are prohibited (P(space) < 5%), grouped into 18 structural families and quantified based on the corpus.

RE: Why and how the text could be Bavarian - rikforto - 14-05-2026

The sense I have is that the spacing rules are not essential for distinguishing minimal pairs, but were an important part of the scribe's toolkit for supporting that process. So we can argue if the transliteration should have a space on f82v where you see qoteytyqoky, but if there are firm rules about word boundaries, the scribe didn't need to worry about a clearly distinguished word space every time because a y-q pair is a split regardless.

It's fairly sophisticated, too! A mistake a lot of conlangers make (and I'm not claiming this is a conlang, but it's clearly a con-script) is not having enough of this supporting information

RE: Why and how the text could be Bavarian - JoJo_Jost - 14-05-2026

Yes, that’s a very sharp observation. And it would explain why not all the rules work so well. But I’m much more inclined to believe that the writer tried to create the impression that it’s a Latin text by using the spacing. If he wants to create that impression, it’s almost certain that it isn’t Latin. Big Grin

Why else would there be such consistent spaces around the most recognizable Latin abbreviation patterns? “qo-” at the beginning of a word looks exactly like the “qu-/quod/quia” ligature. The 9-shaped glyph appears both at the beginning of a word (where it would suggest con-/com-) and at the end of a word (where it would suggest -us/-um). These are the two most recognizable Latin abbreviation positions on any 15th-century page.
Why else would there be spaces between the y at the end (us / um), the y at the beginning (con), and the qo at the beginning (quod, etc.)? Everyone will think: That’s Latin!

I find this very suspicious, especially since I can now say, based on many statistical analyses, that the probability that it is actually Latin is very low...

RE: Why and how the text could be Bavarian - JoJo_Jost - 15-05-2026

I decided to test how the VMS performs when treated as a pure stream of glyphs - that is, simply ignoring spaces and letting everything run through like a string of glyphs. Lines remain lines; I did not join across line breaks.

Then I counted the most frequent bigrams (overlapping) and saw how quickly the curve started to rise. (As far as I know, bigrams have tended to be studied at the token level so far—especially bigrams across spaces, such as yq (yqo), have not been examined; correct me if I’m wrong here.)

I did the same with MHD and Latin. And then I also tried a strict Bavarian phonetic reduction of MHD - that is, merging vowels, shortening endings, grouping consonants.
The result is this:

Filename: bigrammverteilung.png Size: 247.02 KB 15-05-2026, 07:06 AM

VMS is the blue line. MHD manuscript corpora (orange) and Standard Latin (green) lie far below it - meaning much flatter curves with much more widely distributed bigrams. I haven't tested it, but I would expect normally written Romance and Germanic manuscript languages to behave more like MHD and Latin than like the VMS - though that's just a hypothesis, of course.

But when you apply the hard Bavarian reduction to MHD (red), the curve almost overlaps with the VMS. The top 10 bigrams cover over 30% for both; for the top 100, both reach roughly 93%.

Does that mean the VMS is Bavarian? No - and that’s not what I’m trying to say. The test has a clear catch: due to the hard reduction, I lose bigram types. The VMS has 514, my Bavarian test only 222! So that’s too coarse. And the VMS has very few bigrams that occur very frequently, whereas normal languages are, logically, more widely distributed.

But it reveals something interesting: the typical “flat” curve of a standard written language like Middle High German or Latin doesn’t really match the VMS. However, as soon as you simplify the language the way it’s done in spoken Bavarian, you get suspiciously close to the VMS distribution. That said, I have to admit that the simplifications are a bit mechanical, because I don’t have a phonetically transcribed Bavarian corpus.

Conclusion 1: What does that mean? Anyone who assumes plaintext encryption / monophonic encryption and takes a well-structured language—such as the classical European languages—as a basis will never be able to replicate the VMS curve. So there must be some form of strong reduction or shortening in between. A possible bavarian cipher would have less work to do to achieve this curve - compared to a cipher that would first have to heavily reduce Middle High German or Latin itself.

Conclusion 2: A free homophonic cipher would normally produce a much flatter curve, simply because frequent plaintext units are distributed across several cipher symbols. In the VMS, it’s the other way around: the curve is highly concentrated. Therefore, homophony alone is less likely to explain the VMS curve. If homophony is involved, it must be constrained by some positional, structural, or rule-based mechanism.

The second question: Is this bigram level the level that can reveal the essence of VMS to us? In any case, many times more so than the glyph level. But I would not claim that bigrams are necessarily the final units of the system. They may simply be the first level at which the underlying structure becomes visible.

Passible Solutions: We might be dealing with a cipher heavily interspersed with null elements, which was typical for the time. Multiple null elements occurring frequently would inflate the graph in a way that matches the VMS. If we were to remove these from the text, the resulting curve would be somewhat flatter (but not like latin / MHD) - and there would be fewer bigramms.

If one were then to shorten the phonetic Bavarian slightly less restrictively - which would certainly be closer to reality - that would lead to more bigrams there. And so the two curves might gently overlap again.

---
Along with the marginal notes, the word lengths were the reason I decided to examine the VMS to see if it was based on Bavarian. Now that I no longer assume the visible word lengths had anything to do with the actual word lengths, I was missing an important argument.

With this analysis, however, I have come a little closer again to the idea that the VMS might be Bavarian.