The Voynich Ninja

Full Version: 22% of 38,000 vords
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I was going to do a whole study using python, but it was easy to find this statistic at You are not allowed to view links. Register or Login to view.. The voynich is abnormal with a huge anomaly of 22% of initial vowel being only "o" of the first letter of a vord out 38,000 vords.  Some say the voynich is 37,000 words so the % could be a little bit higher.

What does this suggest to me along with other signs from the text.  You are not allowed to view links. Register or Login to view. word before and after test show the Voynich at 100 above 20 which is random and languages at 280-300.  It is low entropy.  No one has been able to decode it from substitution that is reproducible.  I believe that the first letter of the vord being a vowel is 22-23% of the text words.  There are those that say it is meaningless, but the voynich does have structure; therefore the conlang angle.  As for a language or a cipher that will present itself reproducible the same anywhere from substitution, I would find this hard to believe.  You will always eventually output unintelligible syntax.

This suggests to me that the Voynich is an attempt at rules based conlang and a messy one maybe.  If you have a comment against the conlang angle and believe it behave more like a known language.  What evidence against this do you have to support your idea?


Here is a reference for English along with letter placements in words

You are not allowed to view links. Register or Login to view. by Position Within Word

"t" was the highest with 15.9%. and that is not a vowel.  "a" was 11.68%

If you agree with me then you might also feel like I do that the meaning was lost with the people involved.  So we may only get subjective feelings from Voynich on its meaning from imagery.  The text may elude us.
I know you’re not really supposed to bring up your own theory in someone else’s thread, but since you specifically asked about it, I hope that’s okay, because it answers your question. 

There is a possible explanation. You cite 22% of words beginning with the vowel “o” as an anomaly. That figure is correct

I assume that the “o” represents an absorption of the article in German. However, since “o” is also a normal letter, I excluded all words consisting solely of an “o” and a single letter—that is, all words with only two or one letter. 

This brings us to approximately 19%. 

I compared this with four Middle High German medical and prescription texts from the same period and region as the VMS parchment, and counted all attested independent definite articles (der, die, das, daz, diu, di) as well as all their case forms and spellings: 

Ortloff von Baierland (47,569 words): 11.7% 
Breslauer Arzneibuch (92,848 words): 17.8% 
Admonter Bartholomaeus (29,244 words): 17.1% 
Cookbook Cod. germ. 1 (2,788 words): 15.0% 

They thus account for between 12 and 18% of all tokens. The VMS prefix “o,” at 19.6%, lies at the upper end of this range. And this fits with the fact that the VMS has very few short words overall - only 9.7% of tokens have one or two glyphs, compared to 16-20% in the MHD texts. There is simply no room left for standalone articles...

What’s really interesting here is that the distribution of these “o”s as articles fits quite well with the Bavarian / Middle High German language of the 15th century. And that’s just part of what I call the absorption cipher. (This type of cipher can also be used to explain the “qo,” and that fits the frequency pattern as well.) 

Its no proof, i know, but what I’m trying to say is: You should be careful about jumping to conclusions; there are always possibilities you haven’t considered  

It could, of course, be a coincidence, but in principle, when you also consider the other grammatical similarities, a Coland design seems rather unlikely to me.
(28-03-2026, 07:30 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.I know you’re not really supposed to bring up your own theory in someone else’s thread, but since you specifically asked about it, I hope that’s okay, because it answers your question. 

There is a possible explanation. You cite 22% of words beginning with the vowel “o” as an anomaly. That figure is correct

I assume that the “o” represents an absorption of the article in German. However, since “o” is also a normal letter, I excluded all words consisting solely of an “o” and a single letter—that is, all words with only two or one letter. 

This brings us to approximately 19%. 

I compared this with four Middle High German medical and prescription texts from the same period and region as the VMS parchment, and counted all attested independent definite articles (der, die, das, daz, diu, di) as well as all their case forms and spellings: 

Ortloff von Baierland (47,569 words): 11.7% 
Breslauer Arzneibuch (92,848 words): 17.8% 
Admonter Bartholomaeus (29,244 words): 17.1% 
Cookbook Cod. germ. 1 (2,788 words): 15.0% 

They thus account for between 12 and 18% of all tokens. The VMS prefix “o,” at 19.6%, lies at the upper end of this range. And this fits with the fact that the VMS has very few short words overall - only 9.7% of tokens have one or two glyphs, compared to 16-20% in the MHD texts. There is simply no room left for standalone articles...

What’s really interesting here is that the distribution of these “o”s as articles fits quite well with the Bavarian / Middle High German language of the 15th century. And that’s just part of what I call the absorption cipher. (This type of cipher can also be used to explain the “qo,” and that fits the frequency pattern as well.) 

Its no proof, i know, but what I’m trying to say is: You should be careful about jumping to conclusions; there are always possibilities you haven’t considered  

It could, of course, be a coincidence, but in principle, when you also consider the other grammatical similarities, a Coland design seems rather unlikely to me.

I look forward to you decoding it.  I think it would put my mind at ease if I knew there was a proof of a legible read from it not just one sentence.  Would this letter also be the highest frequency letter in Bavarian / Middle High German?
I'm still not sure if I can decode it. Wink

vowels in my MHD Texts: 
e = 40-44%
i = 20–23%
a = 15–21%
u = 6–12%
o = 7–9%

ca. 3 % Latin words....
(28-03-2026, 06:28 AM)oeesordy Wrote: You are not allowed to view links. Register or Login to view.[..]. The voynich is abnormal with a huge anomaly of 22% of initial vowel being only "o" of the first letter of a vord out 38,000 vords.  Some say the voynich is 37,000 words so the % could be a little bit higher.[..]

This is assuming „o“ as a vowel; I am ok with that, since I also assumed it being a vowel. That makes two assumptions, but they are of course not anything like a proof.

o being a vowel, for example „e“, may cover also slight variations like „ye“ or „yö“. Such variations exist in Slavic laguages, they are covered by different characters today. But not necessarily in some old language or alphabet where o may have covered such additional variants and was differentiated and understood by readers only by context.

Finally, the source language simply might have a clear tendency to e-words.

At all, the o percentage may be less scary than it seems.

My transcription into a Word document says 37.242 vords, but I erased those annoying „half-width“ blank positions, where other transcriptions set a comma, joining previous and following vord to one combined vord. 
So other counts will point to 38.000~something.
Quote:You are not allowed to view links. Register or Login to view. The problem raised by Tiltman is that known examples of such languages are late, and we now know that they are much later than the creation date of the Voynich MS.
Whether that was also the reason why both Friedman and Tiltman did not seem to have gone anywhere with that idea, I do not know.

To be clear, this objection isn't exactly proof that it could not be a constructed language, but it makes it a more challenging proposal.

For me, it is something that seems prohibitively difficult to approach.

On the other hand, my favourite 'key question'  about the Voynich MS may provide some insight.
That is: in case we could answer this key question.

This is:
Is it possible to do a word-by-word substitution of the Voynich MS and come up with a meaningful text?

I see more reasons why the answer should be "no"  rather than "yes".

If it is "no", then we cannot create a dictionary of Voynichese to some known language.
A constructed language is most easily conceived in the form of a dictionary.

If it is "no", then also all types of ciphers are excluded, even the more complicated diplomatic ones.

Essentially all past proposed meaningful solutions assume that some form of dictionary should exist.


The 100 rating given by Davidd for the Voynich which is barely over 20 for random and the odd statistics does point to construction and maybe I should not have included language if that needs a dictionary with meaning.  We do have a list of vords for the corpus of Voynich and so a super autistic person could keep track of bizarre meanings to himself of the vords.  Or maybe its' constructed to look like a language with no dictionary, therefore no meaning.  I do think it had meaning the text to the author, yet for anyone else no way.  

There is one image that might be interpreted as a You are not allowed to view links. Register or Login to view. f79v.  And a You are not allowed to view links. Register or Login to view. 86v3 contain birds, Holy Spirit, this is speculative, but if the author of the manuscript who led the instruction of the manual believed in speaking in tongues then a construction without a dictionary is feasible.  This would be a glossolalia.  If this was for healing then speaking in tongues to heal the ill with a tonic elixir to invoke God is possible and speculative of course.  I do agree though with Renee, but a bit differently that the meaning was lost with the author or the group that created the Voynich.  In that post and I apologize Renee if I'm wrong, but I think you were leaning towards meaningless and no conlang.  And Renee I know you proposed it as "if".

If I'm right it is a huge bummer for interpretation and a path for meaning towards the voynich text is null.  It's obvious the Voynich Manuscript is not a logical language based behavior model.  I think the stats over time will go with construction and no meaning for public consumption.  However, to the author or group of its invention its meaning was lost to them as the text goes.
(28-03-2026, 06:28 AM)oeesordy Wrote: You are not allowed to view links. Register or Login to view.The voynich is abnormal with a huge anomaly of 22% of initial vowel being only "o" of the first letter of a vord out 38,000 vords.

I believe that the text is a You are not allowed to view links. Register or Login to view..  Such languages generally use tones (pitch levels or pitch profiles) to increase the number of distinct syllables. Mandarin, for instance, has four tones (or five, depending on definition), and the frequency of each tone is very roughly 25%.  

Even if you still don't believe in that claim above, know that tones are a feature of some polysyllabic languages too, like  Swedish and Pirahã. 

In commonly used phonetic transcriptions of such languages, the tone is usually encoded by a diacritic, like zhǔ, or a postfix digit, like zhu3.  But it is sometimes also encoded as one or more digits interspersed with the other leters, like 2ya1o3 to mean "start with mid-level pitch, drop to low pitch, then rise to high pitch". 

So maybe Voynichese o means "start with mid-level pitch..."

Also, most languages have two series of consonants distinguished by voicing: p/b, f/v, k/g, t/d, s/z, sh/j, th/dh, tch/dj, ... Many languages also have parallel series of consonants that are distinguished by other features, like "aspiration" or "doubling".  

In many scripts, these differences are denoted by various marks applied to a "base" consonant.  In Japanese hiragana and katakana, for example, voicing is indicated by a "diacritic" that looks like double quotes (as in げいしゃ "geisha"), and consonant "doubling" is indicated by a special sign っ before the consonant (as in にっぽん "Nippon").

So maybe Voynichese o means that the next consonant is voiced, doubled, aspirated, whatever...

All the best, --stolfi