The Voynich Ninja
Why and how the text could be Bavarian - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Theories & Solutions (https://www.voynich.ninja/forum-58.html)
+--- Thread: Why and how the text could be Bavarian (/thread-5312.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23


RE: Why and how the text could be Bavarian - JoJo_Jost - 23-05-2026

Okay, let's continue—it's getting more interesting.

Assume the basic structure: VCCV. We're at the second C in VCCV,
so at the first consonant in the sequence.

Apparently, the Core contain two different encodings. We’re familiar with the “e” family, but the “o” and “a” families also shape the pattern of the second half of the Core. We’ll get to that later. For now, we’re only concerned with the first C—that is, what comes before these families.

And here there’s a peculiarity I didn’t anticipate myself. It seems as though the first consonants are subject to a simple substitution cipher. (No, just the first consonants—by no means the entire VMS!) Regarding the frequency comparison with MHD:

(The first C was tested in comparison with the consonant endings of the words in the MHD texts (although the first C does not appear only at the end of words—but given the amount of data, this still provides a clear picture). Note that the natural word boundary lies between the first C and the second C—so the first C is always at the end of the syllable or the word.)

Here is the table:
   

And the graph:
   

Very interesting and astonishing. Could it really be that simple? But I can’t say yet whether this is a possible solution. It would be very premature to conclude that, and as long as I haven’t solved the second C—which likely encompasses the consonant clusters at the beginning of words—it’s impossible to verify.

I checked how many words in Middle High German end with one consonant: 81.45%—that’s a lot, and it fits perfectly with the behavior of the first C in the VMS and to this idea.

But there’s something else that’s more than astonishing in this context:

Take a look at the coverage. In both the VMS and the MHG texts, the first 10 letters cover 94.xx percent of the letters in these positions!


   
   

I’ve also made progress with the vowels; more on that later.

I’m naturally very curious to see if this will just turn out to be another round of Bullshit Bingo Big Grin , or if there’s actually something substantial here....

Current unsolved problems:
- Determine vowels more precisely
- Decode the second C (quite difficult)
- Assign the aiin family (very difficult)

PS Important: this assignment is positional. It holds for this occurrence only and does not generalize to every instance of the glyph in the VMS.


RE: Why and how the text could be Bavarian - JoJo_Jost - 26-05-2026

The real structure of VMS?

I've made a lot of progress.  I can now rewrite the structure of large parts of the VMS into the VCCV (V1 C1 C2 V) structure without making to many assumptions. It actually results in an interesting flow.

The fascinating thing is that a logical flow emerges almost on its own, one that Flow fits perfectly with European and other languages. Simply because of its clear vowel and consonant structure.

Nevertheless, I’m sticking with MHD for now, for several reasons: One important one is that since C1 must be very short (you can tell this because clear, familiar, and repetitive glyph sequences often appear after C1), it fits perfectly with MHD. This is because in MHD, the ends of words are often single consonants. Here, too, the frequencies—when comparing the number of words ending with one consonant to those ending with two—align with the observations in the VMS. 


Here is an example of what the VMS likely actually looks like:
(Note: It's not perfect yet, but it's pretty close.)

<...> = Vocablock It is formed by the glyphs that appear in a previously space
(...) = C1 The last consonant of a word or syllable in the VMS is placed in the middle of the normal tokens  
[...] = C2 The First consonant of a word or syllable in the VMS is placed in the middle of the normal tokens in 
|...| = LSM / line start marker (Line start & End markes unsolved)



f3r.1 ORIGINAL:
tsheos qopal chol cthol daimg

f3r.1 SEGMENTED:
|t| <∅sh> (e)    [o]<sqo>(p)   [a]<lch>(o)<lcth>(o)<ld>(a)    [im] <g∅>


f3r.2 ORIGINAL:
ycheor chor dam qotcham cham

f3r.2 SEGMENTED:
|y| <∅ch> (e)   [o]<rch>(o)<rd>(a)<mqo>(t)    [ch]<amch>(a)<m∅>


f3r.3 ORIGINAL:
ochor qocheor chol daiin cthy

f3r.3 SEGMENTED:
|o| <∅ch>(o)<rqo>(ch)     [eo]<rch>(o)<ld>aiin<cth>     (y∅)


f3r.4 ORIGINAL:
schey chor chal chag cham cho

f3r.4 SEGMENTIERT:
|s| (∅che)<ych>(o)<rch>(a)<lch>(a)<gch> (a)<mch>(o∅)


f3r.5 ORIGINAL:
qokol chololy s cham cthol

f3r.5 SEGMENTIERT:
|qo| <∅k>(o)<lch>   (o)[lo]<lys> <ch>(a)<mcth>(o)<l∅>


This naturally gives rise to new word boundaries, which, however, are still “assumed” here.

It's also fascinating that part of the code seems to be that the vowels in “Eva” are actually consonants, and vice versa.


RE: Why and how the text could be Bavarian - JoJo_Jost - 26-05-2026

So is the low entropy of the VMS partly due to a segmentation effect?

I am referring here to the entropy of the glyph or symbol stream, specifically the conditional entropy H(next | current), not to word entropy. Word entropy would require reliable word boundaries, and these have not yet been definitively established here.

I am aware that resegmentation can generally increase entropy. But that is not the point. If certain vowel bigrams and consonant clusters function as merged units, then their internal glyph transitions will naturally be highly predictable.

If these units are then measured as ordinary consecutive glyphs, their internal predictability is counted as part of the text’s entropy. This would artificially increase predictability and correspondingly lower the measured conditional entropy at the glyph level.

In my tests, this new structural segmentation appears to raise the entropy toward the range expected for a normal language-based stream. This is not easy to compare directly, because the MHD comparison material has to be adjusted to the same structural level. But roughly speaking, the direction is correct - and it is also logically what one would expect.


RE: Why and how the text could be Bavarian - Jorge_Stolfi - 26-05-2026

(Yesterday, 07:15 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.I can now rewrite the structure of large parts of the VMS into the VCCV (V1 C1 C2 V) structure without making to many assumptions. [...]This naturally gives rise to new word boundaries, which, however, are still “assumed” here. It's also fascinating that part of the code seems to be that the vowels in “Eva” are actually consonants, and vice versa.

Quite intriguing.  My hunches (not yet worth calling "theories)  are somewhat similar, somewhat opposite.

I believe that the VMS spaces are "honest", in that they were meant to be boundaries between syllables; although they are quite "noisy", since many spaces are spurious or missing.   

But I also assume that the syllable structure is CCVVVC, where each part is optional except that there must be at least one V.   So, comparing with You are not allowed to view links. Register or Login to view. for the VMS word structure, it is my guess too that the gallows and benches (Ch, Sh, ee) and their -e suffixed variants would be vowels, while the dealers (d, l, r, s) would be consonants.  

But that would be too many vowels and too few consonants. So perhaps the circles (o, a, y) are modifiers for the dealers, that turn them into a decent set of consonants.    

But I also have a hunch that the circles may be pitch marks -- say o = low, a = mid, y = high.  Thus a syllable with pattern o..y would be  in Madarin's ascending tone (á) while a syllable like a..o..y would be in the "dipping" tone (ǎ).  If this hunch were correct, it would explain why the circles seem to be so 'noisy": if a syllable that ends with high pitch is followed by one that starts with high pitch, there would be no need to write the y twice, and either one could be omitted (or not).

Anyway, these are still just vague hunches.

(By the way, my model as described in that post needs updating. I am now convinced that m is generally a scribal abbreviation for in or iin, and all instances of ir are typos (rather, "quillos") for iin. And many or all ar may be typos for ain.  So the codas are only n, in, iin, or iiin. And these claims are independent of the Chinese Origin theory.)

(Except that it would mean that there are only four codas.  Just as many as the Mandarin tones.  Hmm...)

All the best, --stolfi


RE: Why and how the text could be Bavarian - JoJo_Jost - 26-05-2026

@ stolfi

Thx - Yes, I read your theory from back then; this is based on it, at least in part, but of course goes much further. The key point isn’t so much the slot structure—that was already known—but rather that the bigrams formed by word boundaries consist of a single letter and are therefore, logically speaking, a vowel (and diphthongs). Only with this specific trick can one achieve a balanced ratio between vowels and consonants. And as already mentioned here, if that were actually the case, it would have been an extremely ingenious trick to conceal the vowels.

We’ll see; of course, I’m always much further along than what I’m writing here, but I haven’t done any plaintext analyses yet—that will be the decisive criterion...


RE: Why and how the text could be Bavarian - Grove - 26-05-2026

I think that the first word of a page can’t really be a continuation of a word from a previous page. If anything that initial glyph has to set the page start somehow. There are other quirks like split gallows that do something specific to the beginning of a page along with the weird paragraph first line use of f and p.


RE: Why and how the text could be Bavarian - Jorge_Stolfi - 26-05-2026

(Yesterday, 01:49 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.I think that the first word of a page can’t really be a continuation of a word from a previous page. If anything that initial glyph has to set the page start somehow. There are other quirks like split gallows that do something specific to the beginning of a page along with the weird paragraph first line use of f and p.

While some page-initial glyphs are extra fancy (like on You are not allowed to view links. Register or Login to view. and f42v), most of them seem to be no more fancier than other paragraph-initial glyphs. 

So I don't think that the start of a page is a semantically significant place.  I think that the extra-fancy gallows are just decoration provided by the Scribe. 

On the other hand, I expect that the start of a paragraph will be semantically different from other places in the text.  And I don't include here the use of puffs (p and f gallows) in place of other glyphs or glyph combinations, which I take to be a purely decorative decision by the Scribe.

The most likely contents of the Herbal section is indeed a herbal, even if "fake" like the You are not allowed to view links. Register or Login to view.   Then each paragraph in a page is likely about one useful part of the respective plant: fruit, bark, leaves, root, etc.  Then we can expect each paragraph to have the same loose structure. Like, first the name of the plant or part, then a list of uses, each possibly with dosage, method of preparation, etc.  Then maybe end with the type of location where the plant grows, and perhaps method of harvesting (see the entry for "mandrake" in any good medieval herbal).

One consequence of this hypothesis is that the contents of each page should be independent of the other pages, apart form the words and phrases that belong to the "herbal" topic in general (like "herb", "tea", "cures", etc.).

Under this hypothesis, the pages should have no "natural" order.  Needless to say, if one orders the pages by some statistical similarity criterion, then their statstics will seem to vary gradually along the sequence...

All the best,--stolfi


RE: Why and how the text could be Bavarian - JoJo_Jost - 26-05-2026

(Yesterday, 11:18 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I believe that the VMS spaces are "honest", in that they were meant to be boundaries between syllables; although they are quite "noisy", since many spaces are spurious or missing. 


Oh, one more thing:

The problem is that the VMS spacing seems to follow several fairly strict rules. We have seven or eight Rules. That can, of course, be made compatible with syllable boundaries - and in that sense it can also roughly fit the Chinese-syllable idea - no question.

But then the ending distribution has to be explained as well. The four most common final blocks cover 81.13% of all token endings, but almost 20% are still formed by other endings. So the system is not simply "a small closed set of syllable endings". It only works if the model also explains why this particular distribution appears.

This is where Occam's Razor becomes relevant for me. I do not mean it as proof! I have no proof. I mean it as a preference between competing structural explanations.

The structure proposed here explains the same spacing rules with one very simple mechanism:

Split the vowel bigrams and place the visible space between the two parts.

That single rule naturally creates strong restrictions at token endings and token beginnings. It also explains why the most common endings are so concentrated, while still allowing a residue of less common endings.

That has a certain charm. Wink


RE: Why and how the text could be Bavarian - JoJo_Jost - 26-05-2026

(Yesterday, 01:49 PM)Grove Wrote: You are not allowed to view links. Register or Login to view.I think that the first word of a page can’t really be a continuation of a word from a previous page. If anything that initial glyph has to set the page start somehow. There are other quirks like split gallows that do something specific to the beginning of a page along with the weird paragraph first line use of f and p.

Yes, I agree. The top of a page is probably a hard reset point. I wouldn't treat the first visible character of a page as a normal continuation of the previous page.

In my model, this belongs to the LSM/line start marker. The current V/C stream must begin, and often it won’t start with V1. Usually with a consonant. 

So there must be an indication of exactly where the stream begins. And that’s why line-start markers are important. In recipes, paragraphs usually begin with verbs: take, do this, mix that, or with plant names. So usually with a consonant. And I suspect that P says: Start at C2.

There are also vowel beginnings, such as y, a typical left vowel (V1). Then it makes sense that the line starts with a vowel, and y—and probably one or two other values—indicate which one...

So the first letters of a page could usually be a P....

But of course, this is still just a theory; however, it explains the necessity and the existence of the LSM in the simplest way possible.