Why and how the text could be Bavarian

Why and how the text could be Bavarian - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: Why and how the text could be Bavarian (/thread-5312.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

RE: Why and how the text could be Bavarian - nablator - 24-02-2026

(24-02-2026, 04:15 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.absorption cipher

What is it?

RE: Why and how the text could be Bavarian - JoJo_Jost - 24-02-2026

Absorption encryption is one of the core principles of my Bavarian encryption hypothesis. The basic idea is that function words (articles, prepositions, verb prefixes) are not encrypted independently, but are absorbed into the following content word as prefixes.

Specifically, this works in three absorption steps:

Articles → o-prefix: The article (der, die, das, ein, einem, ain...) is deleted and an o-prefix is added to the following content word. So: die wurtzen → owurtzen.

Prepositions → qo prefix: Prepositions (in, mit, von, auf, durch...) are deleted and the following word is preceded by a qo. If an o from the article is already present, qo replaces the o – it is not stacked. So: von dem keller → qokeller (not qookeller).

Verb prefixes → prefix y: Prefixes such as ge-, ver-, be-, er- are realised as y-. Thus, participles and verb prefixes result in words that begin with y-.

The result is that several words in plain text (e.g. preposition + article + noun) are combined into a single cipher word. Function words disappear as independent tokens and become structural markers within the encrypted word.

This also works with suffixes, but I'm still working on that: for example, aiin could mean Bavarian = ‘ain, ein, en’ (one / a). ‘En’ is one of the most common suffixes in German. So if aiin is attached to a word, it would be the suffix ‘en’; if it stands alone, it would be ‘ein’. With daiin, it's more complex: D (the 8) could also stand for several letters, such as ß, tz, z, ss and others. The same game as with aiin.

The elegant thing about this is that this system explains why the VMS has so few short, independent function words and why, at the same time, the statistical frequencies of the prefixes (o-words ~21%, qo-words ~14.8%, y-words ~4.9%) correspond so closely to the frequencies of articles, prepositions and verb prefixes in Bavarian (see above You are not allowed to view links. Register or Login to view.).

The remaining words then undergo Gallows substitution (initial consonants and consonant clusters become EVA Gallows characters), and the consonant clusters are also absorbed into one glyph, so to speak. Nasal rules and vowel reduction – but absorption is the structuring step that creates the VMS word structure.

In addition, word length, which is shorter in Bavarian than in VMS, is more evenly distributed and matches the Voynich. Various other statistical anomalies also fit better.

However, it is still too early to make definitive statements... therefore: work in progress...

RE: Why and how the text could be Bavarian - Mauro - 24-02-2026

If I understood exactly, it's a many-to-one, lossy (not univocally decriptable) cipher? Ie. "from a cellar" and "into the cellar" would both be "qo-cellar"?

RE: Why and how the text could be Bavarian - nablator - 24-02-2026

No way to cipher the Latin in the Breslau Pharmacopoeia?

Centauree
suco adiecto melle inunge oculos. aciem exacuit et
sanat. Item recipe manipulum centauree. et in
uino madebis. et tunc adiecto melle per diem coque.
et ieiunus bibat. Item qui cum apertis oculis non
uident. Serpillum. id est ueltkunel in aqua coque.
et ex ipso oculos assidue laua. Item fel galli mixtum
cum fauo mellis clarificat oculos. probatum est.
Vt pili in oculis non crescant. Primo tolles pilos.
et statim impones illud medicamen sucum edere.
et sucum radicis raphani equis ponderibus. permisces.
et sublatis pilis inpones. Item canitio lacte tanges
et non cresent. Item ypocras dicit. sangwisugas
septem in olla rudi combures. et cinerem earum
ad lenitatem teres. et sublatis pilis pones
ad oculos. Cui oculi dolent. acetum et pulegium
simul coque in uase cupreo adcremtum usque dum
tercia pars remaneat. et cum penna oculos unge.
Item ad caliginem oculorum antiqum feniculum
et rose similiter rute. et mel_atritum. id est humeln
honic. et fel wulturum uteris. Ne tibi oculi doleant.
Quando prima hyrundinem uideris hoc dic ter.
Rogo te hyrundo ut hoc anno oculi mei non lippeant
nec doleant. Item ad uitia uel percussuram
oculorum. Quinque folium tundas cum anxungia
ueteri sine sale et imponas. Item ad caliginem
oculorum rore caulium laua. Item sucum feniculi.
mel estiuale. sanguinem columbinum. hec equa
mensura misces. et cetera albo oui superlinies.

RE: Why and how the text could be Bavarian - JoJo_Jost - 24-02-2026

At first glance, this seems like a valid argument.

However, much can also be deduced from the context. In medieval recipe texts, the sentences are extremely formulaic. And this is exactly what can be inferred from the repetitions in the VMS.

‘Nym X von dem Y’, ‘leg das an die stat’, ‘seud das in einem wasser’, ‘trink das mit wein’ — the combinations of verb + noun almost inevitably determine the preposition. Anyone familiar with the text genre can reconstruct the correct preposition in most cases.

It becomes more difficult, I admit, when it is no longer a matter of standard recipes — for example, in the astronomical or cosmological sections of the VMS.

But! The author probably did not write this cipher for an anonymous recipient who was supposed to decipher the text cold.

If it was a personal notebook — and much in the VMS suggests this — then the author only had to understand himself. And he knew what was meant.

It would then be more of a personal shorthand, stenography, which is why we have problems deciphering it, especially when it is still written phonetically in Bavarian.

PS And, as I wrote above, I have come to suspect that it simply wasn't a good chiffre. Wink

RE: Why and how the text could be Bavarian - JoJo_Jost - 24-02-2026

Example

Nimm die Wurzel und zerschneide sie und gib sie in das Wasser // Take the root, chop it up and put it in the water.
Nimm oWurzel - zerschneiden, qoWasser // Take oRoot, choop, qowater.

(I only realise now that the idea may have originated from the versatile Latin word "quo"...)

RE: Why and how the text could be Bavarian - JoJo_Jost - 26-02-2026

I was interested to see whether applying the existing theoretical cipher would reveal a German structure. So I took a look at folio f51r, which I believed to be Alraune/mandrake. First, the Rules, and then the result:

Rules:
ch=n, y=e/y (y at the end of a word can also remain y, cf. bair ‘sey’), d=s/ts/tz/z/ss (sharp s), o→o (can also be u), sh→sch, tsh→tsch
l=l, r=r, e=e (not yet deciphered)
ee= eu /au / ou
"o" at the beginning of a word = absorbed article, concrete article guessed
"o" in the middle = o / u (Bavarian vowel instability)
"qo" at the beginning of a word = absorbed preposition+article, concrete wording guessed
"y" at the beginning of a word = absorbed verb prefix ge/ver/be
"aiin" = "en", standing alone aiin = ein/ain

Gallows t k p f and Bank Gallows ckh cph cfh remain unchanged, plain text unknown

Result

L1 tsholdchy qotchy opchear ypchedy
t schusslne in die tne die pchear gepchese

L2 dcheodaiin ckheody ckhody chody
sneosen ckheose ckhose nose

L3 ydchody ckhey oty ckheodar qoky
gesnose ckhee ote ckheosr in die ke

L4 daiinces okol cheody ckhy cheeey
sences die kol neose   ckhe neeee

L5 tcheody  qodaiin okeey qockhey taiin
tneose von dem sen die keee in die ckhee ten

L6 ycho daiin chokaiin ykchodaiin ykald
verno tsen noken geknosen gekals

L7 ychos ar eeckhy kcho qokchy qotal
genos r eenkhe kno in die kne von dem tl

L8 oshol odaiin ckhey ckheody qokey otydy
die schol das sen ckhey ckheose von dem kee das tysy

L9 tol  daiin daim qchodal dal qody   qoetam
tol tsen sem von dem nosl sal in die tse in die etam

L10 ykchol dor shey qokeol kchey shol okam
verknol sor schey von dem keol knee schol die kam

L11 tchodaiinoeody qokol oteodaiin kol otag
  tnosenoeose von dem kol die teosen kol die tag

L12 yoees ckheey kol cheeal   okeor qockhey pchodal
geoeus ckheue kol  naul die keor in die ckhee pnotsal

L13 oaiin ckhol ykieol otchey cpheo daiin ykeoldy
ein ckhol gekieol die tnee cpheo sen gekeoltse

L14 daiiithy qodaiin kaiiidal cphodal s al dam
tsenithe von dem sen kensl cphosl s al tsam

L15 qokol cheor ckhal s or aldy otal
von dem kol neor ckhal s or alse die tal

It is, of course, difficult to comment on such a rough text, but the structure shows certain similarities to the structure of Middle High German texts. The articles and prepositions are well balanced, and it would be possible to form reasonable sentences from the surrounding words. In this respect, this small experiment, even with a very limited sample size, confirms that this classification could be correct.

For example this sentence:
die schol das sen ckhey ckheose von dem ke das tysy
= die soll das sein ckhey ckheose von dem ke das tysy
(this what is ckhee ckheose from kee that tysy)

The problem is that many of you don't speak German and therefore can't “experience” it for yourselves. That's a bit of a shame.

---------------
Pure Eisigesis

You know I'm not a fan of Eisigesis (reading meanings into texts), but just for fun, let's do it here. (This is not a serious translation!)

tschusslne (in my opinion, the t at the Gallow at the beginning of each page is either a zero, just to make a nice symbol, or an abbreviation for an instruction: do, take, etc.

That leaves schusslne – unfortunately, that doesn't fit with mandrake at all. But since I'm assuming a phonetic language, and the o can actually be not only u but also “ü,” it would be schüsslne = Schlüssel (key) = Schlüsselblume (key flower). In "Breslauer-Arzneibuch" "Primula veris. himmel-sluzel" (key to heaven) [The flowers match, the leaves (well, with a lot of imagination), the roots don't match so well, although rhizomes are also formed.]

tne could be tine /tinne, the i is phonetically slurred, tinne is a medieval vessel.

Then it would be something like: “tue Schlüsselblumen in die Tinne” = “put keyflowers in the Tinne”

Then there is an article, so the sentence would have to continue something like this: “die [irgendetwas wie] stark gepresst” “that [something like] strong pressed”

It's funny that then it says "sneosen", which is almost certainly “schnäuzen” (to blow one's nose), and a little later it says “nose” "Nase" in modern German, but in Bavarian it also means nose, as in English.

Well, if you look at the effects of key flower:

It contains saponins (soap-like plant substances) which:
loosen mucus
make coughing easier
help with bronchitis

Is this the ultimate proof? Have I now solved the VMS? Big Grin

No, sorry, but it's just eisegesis in its purest form, albeit a funny one.

Let's stick to the facts: theoretically, the absorption chiffer presented here could represent a German/bairischen text well.

RE: Why and how the text could be Bavarian - Mauro - 26-02-2026

(24-02-2026, 06:20 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.At first glance, this seems like a valid argument.

Is this an answer to my post You are not allowed to view links. Register or Login to view.? It was not meant as a counterpoint to your theory: I think a many-to-one, lossy (and not univocally decipherable) 'cipher' is a possibility. I don't know Bavarian (nor German, unfortunately) but from what you said about the language I consider it a possibility too (among many many more other possibilities, 'Chinese' and self-copying included!).

If I understood exactly, your argument is (old) Bavarian has some pecularities of its lexicon which could be compatible with the VMS word structure, and then Bavarian as a source could explain the low character entropy (btw, has anybody posted figures for the character entropy of Bavarian? I understand getting an electronic transcription is not easy). Now you're further reducing entropies (including word entropies) by conflating prepositions and articles in a single "qo", etc.. It's interesting, I wish you well in your research!

RE: Why and how the text could be Bavarian - JoJo_Jost - 26-02-2026

@ Mauro Is this an answer to my post You are not allowed to view links. Register or Login to view.? yes, it was Wink

Thanks for the clarification.

Since I still can't write Python myself and am not an expert in entropy calculations, I asked Claude and ChatGPT independently of each other. Getting both to produce nearly identical values was more complicated than I thought, but I hope it is correct now...

The following had to be clarified beforehand:
In order to achieve comparable results, I had to normalise the German texts to the standard letters (26 alphabet). The recipe text (Kochrezept) contains only 24 letters, as j (written as i) and q (only in Latin loanwords) do not occur in this small corpus.

I normalised VMS to the standard glyphs: a, ch, cfh, ckh, cph, cth, d, e, f, i, k, l, m, n, o, p, q, r, s, sh, t, y, in order to be able to compare them.

I myself use a cleaned-up EVA file, in which some labels are also missing, provided they are single letters. All special characters in EVA were also removed for the calculation.

Filename: Screenshot 2026-02-26 184833.png Size: 72.09 KB 26-02-2026, 07:01 PM

Legend:
H1 — Unigram character entropy (bit)
H1 norm — H1 / log₂(alphabet), normalised to alphabet size
H(X₁,X₂) — Joint bigram entropy (bit)
H(X₂|X₁) — Conditional character entropy = H(X₁,X₂) − H1 (Shannon 2nd order)
Vocab — Number of different word types
Hw — Word entropy (bit)

Key findings:
H1 norm: VMS (0.865) lies in the middle of the MHD texts (0.871–0.901) → no anomaly signal at character level
H(X₂|X₁): VMS (2.503) significantly lower than MHD (3.22–3.50) → high predictability of character strings, explainable by fixed prefix patterns of the absorption cipher
Hw: VMS (10.267) higher than all MHD texts → the cipher generates many word types with relatively few tokens through prefix combinations

The combination of low H(X₂|X₁) and high Hw corresponds exactly to the entropy profile that absorption encryption could generate – the highly predictable character strings result from fixed prefixes, but it increased the diversity at the word level through combinatorial prefix attachment. This profile is not to be expected when generating meaningless text, during self-copying processes, or in unencrypted natural language.

An additional factor to consider: all MHD comparison texts are written in normalised literary language, not in phonetic dialect. Phonetically written Bavarian would have lower entropy than standard MHD due to consonants neutralisation (p/b, t/d, k/g merging), vowel reduction (a→o), and syllable-final reduction (-en→-n or dropped entirely). This would close the remaining H1 gap (0.25 bit) further.

The Admonter Bartholomäus comes closest, but is still partially normalised.

RE: Why and how the text could be Bavarian - Jorge_Stolfi - 26-02-2026

(26-02-2026, 07:03 PM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.I myself use a cleaned-up EVA file, in which some labels are also missing, provided they are single letters. All special characters in EVA were also removed for the calculation.

I suggest that you do all statistics separately for each of the major sections (Herbal-A, Herbal-B,Bio, and Starred Parags). Statistics are properties of a text, not of a language; and each of those four sections seems to be a text of a very different nature.

Statistics are heavily influenced by the nature of the text, as you can see by comparing the Bavarian recipe book with the other three. If you run statistics on King Arthur and The Joy of Cooking together, you may find that the numbers best match the History of Cannibalism among the Knights Templar than either of the two actual books.

The other VMS sections are likely to different from these four, too, but are too small to yield meaningful data. You may come back to them later, once you have got the main conclusions from the meaty sections.

You should also consider only running text in paragraphs -- excluding labels, titles, etc. Labels are likely to have their own statistics, very different from those of the main text.

And you may want to treat the letters p and f as '?'. They are likely to be fancified versions of other glyphs, like our capital letters, or the initials and finals of Arabic script. A text in mixed case will have higher entropy and vocabulary than the same text mapped all to lowercase, even though the meaningful information is practically the same.

Quote:Vocab — Number of different word types
Hw — Word entropy (bit)

These statistics are affected by the size of the sample text. If word frequencies follow the Zipf distribution, the size of the vocabulary is expected to grow proportionally to the square root of the number N of tokens, or something like that. The word entropy should slightly increase too with N. Unless some correction is applied when estimating word probabilities from their observed frequencies.

Quote:H1 norm: VMS (0.865) lies in the middle of the MHD texts (0.871–0.901) → no anomaly signal at character level
H(X₂|X₁): VMS (2.503) significantly lower than MHD (3.22–3.50) → high predictability of character strings, explainable by fixed prefix patterns of the absorption cipher

As I argued several times, character and bigram statistics are hardly meaningful. They are affected not only by the nature of the text, but even more so by the spelling system. Merely spelling German with "x" instead of "sch", "v" instead of "ch", etc will increase its character and bigram entropies.

Quote:Hw: VMS (10.267) higher than all MHD texts → the cipher generates many word types with relatively few tokens through prefix combinations

This number is more significant, but one must normalize it for the sample size before comparing it to other texts. And it too can be inflated by trivial factors -- like the presence of spelling and transcription errors (affecting maybe 5% or more of the VMS tokens), inclusion of p and f as distinct letters, the inclusion of labels, the mixing of texts of different nature, etc.

But I see that, in spite of all those entropy-boosting factors, the VMS Hw is only slightly higher than that of other Bavarian texts. I would call that a match.

Quote:The combination of low H(X₂|X₁) and high Hw corresponds exactly to the entropy profile that absorption encryption could generate – the highly predictable character strings result from fixed prefixes, but it increased the diversity at the word level through combinatorial prefix attachment. This profile is not to be expected when generating meaningless text, during self-copying processes, or in unencrypted natural language.

Ahem! You mean in unencrypted European natural language...

All the best, --stolfi