I decided to test how the VMS performs when treated as a pure stream of glyphs - that is, simply ignoring spaces and letting everything run through like a string of glyphs. Lines remain lines; I did not join across line breaks.
Then I counted the most frequent bigrams (overlapping) and saw how quickly the curve started to rise. (As far as I know, bigrams have tended to be studied at the token level so far—especially bigrams across spaces, such as yq (yqo), have not been examined; correct me if I’m wrong here.)
I did the same with MHD and Latin. And then I also tried a strict Bavarian phonetic reduction of MHD - that is, merging vowels, shortening endings, grouping consonants.
The result is this:
VMS is the blue line. MHD manuscript corpora (orange) and Standard Latin (green) lie far below it - meaning much flatter curves with much more widely distributed bigrams. I haven't tested it, but I would expect normally written Romance and Germanic manuscript languages to behave more like MHD and Latin than like the VMS - though that's just a hypothesis, of course.
But when you apply the hard Bavarian reduction to MHD (red), the curve almost overlaps with the VMS. The top 10 bigrams cover over 30% for both; for the top 100, both reach roughly 93%.
Does that mean the VMS is Bavarian? No - and that’s not what I’m trying to say. The test has a clear catch: due to the hard reduction, I lose bigram types. The VMS has 514, my Bavarian test only 222! So that’s too coarse. And the VMS has very few bigrams that occur very frequently, whereas normal languages are, logically, more widely distributed.
But it reveals something interesting: the typical “flat” curve of a standard written language like Middle High German or Latin doesn’t really match the VMS. However, as soon as you simplify the language the way it’s done in spoken Bavarian, you get suspiciously close to the VMS distribution. That said, I have to admit that the simplifications are a bit mechanical, because I don’t have a phonetically transcribed Bavarian corpus.
Conclusion 1: What does that mean? Anyone who assumes plaintext encryption / monophonic encryption and takes a well-structured language—such as the classical European languages—as a basis will never be able to replicate the VMS curve. So there must be some form of strong reduction or shortening in between. A possible bavarian cipher would have less work to do to achieve this curve - compared to a cipher that would first have to heavily reduce Middle High German or Latin itself.
Conclusion 2: A free homophonic cipher would normally produce a much flatter curve, simply because frequent plaintext units are distributed across several cipher symbols. In the VMS, it’s the other way around: the curve is highly concentrated. Therefore, homophony alone is less likely to explain the VMS curve. If homophony is involved, it must be constrained by some positional, structural, or rule-based mechanism.
The second question: Is this bigram level the level that can reveal the essence of VMS to us? In any case, many times more so than the glyph level. But I would not claim that bigrams are necessarily the final units of the system. They may simply be the first level at which the underlying structure becomes visible.
Passible Solutions: We might be dealing with a cipher heavily interspersed with null elements, which was typical for the time. Multiple null elements occurring frequently would inflate the graph in a way that matches the VMS. If we were to remove these from the text, the resulting curve would be somewhat flatter (but not like latin / MHD) - and there would be fewer bigramms.
If one were then to shorten the phonetic Bavarian slightly less restrictively - which would certainly be closer to reality - that would lead to more bigrams there. And so the two curves might gently overlap again.
---
Along with the marginal notes, the word lengths were the reason I decided to examine the VMS to see if it was based on Bavarian. Now that I no longer assume the visible word lengths had anything to do with the actual word lengths, I was missing an important argument.
With this analysis, however, I have come a little closer again to the idea that the VMS might be Bavarian.