The Voynich Ninja

Full Version: Why and how the text could be Bavarian
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
So, since the "solution" suggested here turned out to be just an April Fool"s joke after all, I will once again focus on the seriousness of the VMS:

The cipher has improved further. I have added and described typical Bavarian features that explain why certain things were done the way they were.

Most importantly, I have now identified more than 50 words that are consistent with the cipher and also match their frequency of occurrence in Middle High German texts. This is no longer comparable to the one-off word identifications proposed by other solvers, who simply suggested words without checking them systematically in context.

Before I list all of this, two things are becoming increasingly clear:

1. The likelihood that the VMS is based on a Bavarian text continues to rise.
2. Taken together, the frequency analyses of the individual components are becoming difficult to dismiss. Any single match could still be coincidence, but in combination the pattern is increasingly hard to explain away as chance.

---------------

1  ABSORPTION PREFIXES

Function words are not written separately, but are

absorbed as a prefix to the following content word.

o-  = absorbed article (der/die/das)

qo- = absorbed preposition (+article)

y-  = absorbed verb prefix (ge-/ver-/be-/er-/zer-/ent-)

s  = “and” (standalone or as a prefix/suffix)


1.2  CONSONANTS

GALLOWS (plosive classes, one glyph each):

  k  = g/k (velar)        10195x  6.70%

  t  = f/v/w (labial)      5907x  3.88%

  p  = b/p (bilabial)      701x  0.46%

  f  = b/p (variant)      240x  0.16%

BANK GALLOWS (complex clusters, one glyph each):

  ckh = ng      900x    cph = pf    211x

  cth = ???    923x    cfh = ???    74x

  tsh = ???    177x



SINGLE CHARACTERS:

  ch = m        10177x  6.68%  (Ratio ch/m = 1.95x; ch≠n)

  sh = sch/s(voiceless)  4357x  2.86%  (Bavarian: initial s voiceless -> sh)

  d  = s(voiced)/z/t/d 6252x 4.11%  (in-syllable/end-syllable-s -> d; initial-s -> sh)

  r  = r          7599x  4.99%

  l  = l/n      10660x  7.00%


SPECIAL:

  pch = n (word-initial)    764x

  dy  = ch (ich-sound)      6964x  4.57%

1.3  VOWELS

  a  = Middle High German a      (open vowels; occasionally also o)

  e  = Middle High German i          (front vowel; occasionally also a)

  o  = Middle High German o, u      (rounded vowels)

  ai = Middle High German ei        (confirmed: ein->aiin, wein->tain)

  ee = Middle High German ie, uo, uee (NOT monophthongized Bavarian diphthongs!

      Standard German ie->i, uo->u, uee->ue — but they remain Bavarian)

      (confirmed: siech->sheedy, weiss->teed)

  y  = final -e      (word-final; n-apocope: -en -> -e -> -y)



Vowel spelling varies: von/uon -> tol AND tal (o vs a)



PART 2: BAVARIAN / UPPER GERMAN FEATURES


2.1  LOSS OF ‘h’

The writer does not pronounce ‘h’: har->ar, honig->onig, halse->aldy

h does not exist as a separate character in VMS.

2.2  FINAL HARDENING

b->p, d->t, g->k at the end of a word.

2.3  LOSS OF FINAL CONSONANTS

kalt->kal, als->al, nicht->pchedy (final -t is dropped)

2.4  n-VARIATION AND n-APOCOPE

n is encoded in DIFFERENT ways depending on its position:

  n -> pch (word-initial: nicht->pchedy, nun->pchol, nim->pchy)

  n -> l  (most common variant: von->tol, pfund->cphol, munde->choldy, getan->kedal)

  n -> n  (after a diphthong: ein->aiin, wein->tain)

  n -> y  (word-final: wann->taly, dann->daly, man->chey, magen->cheky)

  n -> weg (abbreviation: nim->pchy)

  n -> NOT ch!  (ch = m, frequency: ch/m=1.95x, ch/(m+n)=0.49x)

Upper German n-apocope: The unstressed final syllable -en is shortened to -e

(singen->singe, sieden->siede). In the VMS, this appears as -y instead of -ech.

This explains the massive y-ending in VMS.

2.5  VOICELESS INITIAL s (s/sh VARIATION)


In High German/Bavarian, the initial s is VOICELESS — it sounds

like “sch”. The writer hears the voiceless sound and writes sh:

  s -> sh  (initial, voiceless: so->sho, sich->shedy, sol->shol, sucht->shody)

  s -> d  (medial/final, voiced or z/tz: sal->dal, sein->dain, sol->dol)


This is NOT an inconsistency but phonetically correct:

Voiceless (initial) -> sh, voiced/affricate (medial) -> d

2.6  NON-MONOPHTHONGIZATION (ie, uo, uee)

In Standard German, the Middle High German diphthongs are monophthongized:

  ie -> i,  uo -> u,  uee -> ue

In Bavarian, they are PRESERVED as diphthongs (liebe guete Brueder).

This explains VMS “ee”: It is the PRESERVED diphthong ie/uo/uee.

  siech -> sheedy  (ie remains as ee)

  weiss -> teed    (ei -> ee?)

  guet  -> ked?    (uo -> ee?)

2.7  ELIMINATION OF ch IN “NICHT”

High German: nicht -> nit/net/noet (ch is dropped).

In VMS: pchedy = nicht (with ch=dy), but shortened forms are also possible.

2.09  VOWEL REDUCTION

5 Middle High German vowels in 3 VMS slots: a=a, e=i, o=o/u

Vowel spelling varies: von -> tol (o->o) AND tal (o->a)

2.11  STENOGRAPHY

Personal phonetic stenography. The writer transcribes by
ear, not by rules. Therefore:

  - Same words coded differently (tol/tal = von)
  - High-frequency words heavily abbreviated (nim->pchy)
  - Final consonants are often omitted
  - Vowels vary depending on pronunciation

[attachment=14985]


Note: Of course, this is all still "provisional".
New Frequency Comparison: VMS vs. MHD Reference Corpus

I mentioned the frequency comparisons before. Here's the updated table - I've continued working on it.
On the left you see the raw comparison, without any adjustments. On the right, a transformed version that accounts for the cipher's key characteristic: absorption of function words (explained below).

[attachment=14990]

The correspondence is incredible Exclamation

I want to be very clear about what we're looking at here. These are not random features. These are the fundamental structures of grammar -- specifically German grammar:
  • Articles

  • Prepositions

  • Verb prefixes and suffixes

  • Conjunctions


Finding this kind of match in a randomly generated text that also possesses internal logic would be virtually impossible. You simply can't argue that away.

And this is the second important point: there is no other language without a Germanic foundation that could produce these frequencies. This structure tells us that the text underlying the VMS is German. The only alternative would be some kind of code deliberately designed to mimic German grammar -- which makes no sense.


------
Method: Forward Transformation

The standard approach is to transform the MHD reference texts "forward" (applying absorption rules to the MHD), then compare the resulting frequencies to what we observe in the VMS. This avoids circularity - I dont decode the VMS and then check if it looks like MHD.

The MHD reference corpus:

- Ortloff von Baierland (46,565 words) -- medical handbook

- Breslauer Arzneibuch (92,347 words) -- pharmacopoeia

- Admonter Bartholomaeus (29,244 words) -- herbal/medical

- Kochrezeptsammlung Cod. germ. 1 (2,725 words) -- recipes

Ca: 170,881 words


All MHD texts are Unicode-normalized before counting (long-s, old-d, old-r, umlauts, combining diacritics stripped).

Why a transformation? 

The problem is that the VMS text, when viewed as an absorption cipher, naturally has fewer words than the MHD text because they merge with the individual words.

So I added that as a point and called it transformation.

The following absorption rules were examined:

1. Every ARTICLE (der/die/das/ein... – 104 forms) that directly precedes a content word is integrated into that word as a prefix, thus reducing the number of tokens.

2. Every PREPOSITION (in/von/mit/an... – 78 forms) that precedes an article merges with the article to form the following word, also reducing the number of tokens.

3. Every VERB PREFIX (ge-, ver-, be-, er-, zer-, ent-) is integrated into the verb stem.

4. Every "and" (und/vnd/unde... – 10 forms) is integrated as a prefix or suffix into its neighbor.

After the transformation, the number of tokens in the MHD decreases from 170,881 to 129,647, because function words have been integrated into content words.

Why articles = o + gallows consonants?

In my interpretation, gallows consonants (EVA k, t, p, f) mark the beginning of nouns. Articles precede nouns. Therefore, the prefix of the absorbed article "o-" should specifically appear before gallows consonants.

Otherwise, I would have had the problem of having to introduce a boundary for words beginning with "o," which seemed too variable.

Words like "ol," "or," and "odaiin" begin with the vowel o—they are NOT absorbed articles. Only o directly followed by a gallows consonant—this makes sense in many respects.

What -dy encompasses

The VMS glyph "dy" is a single tokenization unit that likely originated historically in the CH cipher from d + y. It functions as an absorption suffix and encompasses three different MHD sources:

1. -ch (+ inflection): the primary meaning. It includes -ch, -cht, -che, -chen, -chet, -chte, -chten. All inflected forms merge into a single VMS form. Example: mach/machen/machet/macht all become "chedy".

2. -sch (+ inflection): It includes -sch, -schen, -schet. Words like mensch, waschen, mischen, fleisch.

3. d + y (dental + final -e): It includes -de, -te, -se, -ze. This is a graphical homography - it looks identical to the dy glyph but represents the two separate sounds (dental + final -e).

Each component has a unique phonetic source in MHD:

- MHD endings -ch: 5.81% of MHD tokens

- MHD endings -sch: 0.86% of MHD tokens

- MHD dental + final -e: 5.33% of MHD tokens

- Combined: ~12.0% (raw), ~11.0% (after transformation)

VMS frequency -dy: 17.5%, which corresponds to a ratio of 1.59x to the transformed MHD. This is somewhat elevated, but still within a reasonable range, as it also depends on the writer. Furthermore, some words with -dy can encode additional, as yet unidentified, endings.

What -ey encompasses

The suffix -ey absorbs MHD -er (with inflectional forms: -ern, -ere, -ert, -ers).

Examples: mehr -> chey (ch=m, ey=er), sehr -> shey, her -> tey, er -> ey.

VMS: 6.0% vs. MHD-transformed: 6.9% – ratio 0.86. Good agreement.

Conjunction "und" (and) = EVA s

"und" appears in three positions in the VMS:

- Standalone "s": 309 tokens (0.81%)

- Word-initial s (first glyph after tokenization): 976 tokens (2.55%)

- Word-final s (last glyph after tokenization): 1,081 tokens (2.83%)

Total: 2,366 tokens (6.19%)

MHD frequency of "und/vnd" in the four reference texts: 6.14%.


Ratio: 1.01 = perfect match – although that's more likely a coincidence. You can see it in the heavily overrepresented "und" (and) in the recipes.

Examples of absorbed "and":

- s + aiin = "and a" (119x)
- s + ar = "and he" (77x)
- cheo + s = "...and" (37x)

Note that over 50 % of s-initial and final VMS words also appear in the corpus without the "s" — exactly what you'd expect if s is a detachable prefix meaning "und".

----
The real question now is this: does the German structure reflect an actual German source text, or a code built on German linguistic structure but without recoverable sense? I do not know that yet.
I have been arguing that EVA "s" = "und" (and) based on frequency analysis. When I added another Bavarian text to my reference corpus — the "Buch der Natur" (BdN) by Konrad von Megenberg, ca. 1350-75, a natural history text from Regensburg — something jumped out at me. The word "und" is underrepresented in this text compared to the medical recipe books. Here is the frequency table:

[attachment=14995]

The explanation is obvious once you see it: the BdN is literary prose, not a recipe collection. Recipes are full of "take this AND that, mix this AND that" — enumerations. The Bavarian recipe collection hits over 10% "und", the BdN sits at 3.88%. Makes perfect sense.

And then it hit me: if "s" really means "and", then the VMS should show the exact same pattern. The Pharma pages with their ingredient lists should have more "s" than the text-only pages. The prose sections should be low. If the distribution matches what the illustrations tell us about text type, that would be strong hint for s = "und" (and) - and simultaneously confirm the content classification of the manuscript.

Here is the section-by-section count:
[attachment=14994]

And there it is.

Pharma (f87-f102): the pages with jars and herb ingredients: 9.73%! Exactly where you would expect a recipe/ingredient section to land — right between the medical recipe books (5-6%) and the pure recipe collection (10.8%) in my MHD reference.

Astro/Zodiac (f67-f73) is the highest at 10.93%. I have not worked on the Astro section myself, but this makes sense if you think about it — these pages maybe full of star labels arranged in circles. "Star X and star Y and star Z" — pure enumeration. Even more list-like than recipes. But this has to be verified. If anyone here has worked on the Astro section and can tell me whether this makes sense, I would appreciate it — I simply do not have the time to dig into that section as well.

The Herbal section (f1-f57) sits at 6.01% — plant descriptions with some recipe elements mixed in. Right in the middle. Balneo and Cosmo are similar.

And here is a fun one: the so-called "Recipes" section (f58, f103-f116), the text-only pages with star markers in the margin — they come in at 4.16%.

That is possibly not a recipe section. That could be prose. The s-frequency is clear. These pages read like the Buch der Natur (3.88%), not like a recipe collection.

More Tests

Other glyphs also vary across sections. That is typical for the VMS — the variance between sections tends to be higher than in normal texts. So the question is: is s actually special, or is it just one of many glyphs that happen to fluctuate?

I tested all 19 standard EVA characters across all six manuscript sections. Three criteria must be met simultaneously for a conjunction candidate:

Test 1 — Frequency. A conjunction like "and" should account for roughly 3-10% of tokens. Most EVA characters fail this immediately: i sits at 13.6%, t at 15.5%, d at 16.4%, a at 21% — all far too high. m at 2.8% is too low. s is the only standard EVA character that falls in the 3-10% range.

Test 2 — Pattern. A conjunction should peak in enumeration sections and drop in prose sections. Many glyphs fluctuate across sections, but they peak in the wrong places — e peaks in Balneo, t peaks in Astro but drops in Pharma, q peaks in Balneo. s is the only glyph in the conjunction frequency range that peaks in Pharma.

Test 3 — Degree of variation. Most standard EVA characters vary moderately across sections, with max/min ratios between 1.4x and 2.2x. That is normal fluctuation. MHD "und" across five real German text types varies by 2.78x — significantly more, because conjunctions are genuinely text-type sensitive. VMS s comes in at 2.63x — the strongest variation of any common EVA character, and right in line with real "und". No other frequent glyph reaches this level.

One glyph out of 19. Three independent tests. All matching the behavior of "and" in real medieval German text.

The overall max/min ratio — 2.63x for the functional s across VMS sections, compared to 2.78x for "und" across MHD text types — tells the same story. Not identical, but driven by the same mechanism: lists need more conjunctions than prose.

The idea that the “s” could be a simple one—appearing both on its own and at the beginning or end of a word—has become much more likely.
The problem with Voynich is it has a mirror property that means that on the surface it reflects every language....
(03-04-2026, 03:02 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.The problem with Voynich is it has a mirror property that means that on the surface it reflects every language....

Hi Ed,

U know, I see this quite differently Angel , I see only one language that fits - the "and" could of course be any language, but here, too, this was compared to MHD/Bavarian texts. Wink

JoJo
@Jojo thanks for this incredible work. I will take a look at it as soon as I have time. But clearly you spent a lot of effort on this, and the correspondences are looking great...
@ JustAnotherTheory


Thank you so much - I really do my best. Each of these posts takes hours to put together, making sure the information is presented in a way that’s easy to understand. So it’s nice to hear something positive for a change, and not just criticism Wink.

So thank you very much...
(03-04-2026, 03:02 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.The problem with Voynich is it has a mirror property that means that on the surface it reflects every language....

I do agree with this observation- even if it was written in *one* language, the fact that so many people think otherwise suggests it has some obscuring effect. 

Also, You are not allowed to view links. Register or Login to view.

"Possibly AI Generated"

I am curious why you decided to include that in your handle description..
(03-04-2026, 06:35 PM)hatoncat Wrote: You are not allowed to view links. Register or Login to view.
(03-04-2026, 03:02 PM)DG97EEB Wrote: You are not allowed to view links. Register or Login to view.The problem with Voynich is it has a mirror property that means that on the surface it reflects every language....

I do agree with this observation- even if it was written in *one* language, the fact that so many people think otherwise suggests it has some obscuring effect. 

Also, You are not allowed to view links. Register or Login to view.

"Possibly AI Generated"

I am curious why you decided to include that in your handle description..

Because it's a deliberate wind up to those on this forum who think AI is bullshit Smile
(28-03-2026, 03:36 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.@ Stefan

You don't need to warn me about anything, Stefan! I'm an adult, and have been for a very long time  Wink 

As far as I'm aware, the solvers on the Solution List are adults.  The latest few to get their solution added to the list, and whose solution we've extensively discussed in the last year, certainly are adults.

I haven't added you to the Solution List yet because I don't think you're completely convinced (yet) of it... but it is easy to fall over the cliff edge and hard to climb back up again.  

Quote: I started when I noticed that the marginal notes were in Bavarian. It seemed obvious to me to test that. And from then on, this theory was increasingly confirmed... Not the other way around....

Most, I'd imagine all, on the Solution List report the same thing.  

The criticism you get here may seem annoying.  But focusing on reason why your identifications may not work is really important.  For example, you could ask yourself:
  • If s = and, why is there such uneven distribution of its following glyph?  Initial so (often sol, sor)  and initial sa (usually saiin/sain) dominate.  S+o might make sense under your system ("and the"), but what's with s+a?
  • Why is sq an invalid combination in Voynichese if it means "and + preposition"?  It only appears once, I believe.
  • How do your glyph identifications fit with line patterns/LAAFU identified behaviour?
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18