The Voynich Ninja
The oddities of the bigram "ed" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The oddities of the bigram "ed" (/thread-5368.html)

Pages: 1 2 3 4 5


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(16-02-2026, 12:37 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If VMS was a cipher and different cipher tables (keys, mappings) were used for different folios, this behavior of ed would be easy to explain. Say, for some folios ed maps to plaintext ST, for other folios ed maps to plaintext GP, very different statistics. However, even under this scenario it's hard to explain why it's only ed that behaves like this.

Is it possible to probe trigrams/tetragrams and see whether any particular longer ngram that contains ed causes this effect?

Excellent question.  You'd think that if ed exhibits that behavior then edy, would also.  But, it doesn't. It's the one on the top right of the chart.  It does only occur on half the pages so as a trigram across the entire Voynich, anything with ed will stand out.  But, within the pages with ed it still follows the curve.   If it were on just half of the pages with ed, which would be hard to do, then it would stand out.

   

So maybe a tetragram containing ed?  Nope.  Same problem as above.

   

The bigram ed is in a class all by itself.


RE: The oddities of the bigram "ed" - oshfdk - 16-02-2026

(16-02-2026, 01:02 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Excellent question.  You'd think that if ed exhibits that behavior then edy, would also.  But, it doesn't.

For clarity, could you show ed trigrams on the same graph as all other trigrams, just in a different color?

Given these charts are cut off at Y~0.5, I'm not sure how they relate to other trigrams.


RE: The oddities of the bigram "ed" - Fontanellean - 16-02-2026

(16-02-2026, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I understand what you're saying about bigram statistics being driven by common words and subject matter. That absolutely happens in natural languages. Your examples make sense in that context.  But what I'm seeing here doesn’t look like normal topic drift.  I'm not just saying that “ed is frequent.”  I'm saying that a very high-frequency bigram simply disappears on many pages, and then appears heavily on others, within and across topics. That kind of page-level on/off behavior is what caught my attention.

If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.

You don't even have to look at bigrams. Take an instruction manual with one page written in British English and another in Pinyin for Mandarin. "Z" will hardly appear at all on the former but will be all over the place in the latter.  Smile


RE: The oddities of the bigram "ed" - oshfdk - 16-02-2026

(16-02-2026, 01:07 AM)Fontanellean Wrote: You are not allowed to view links. Register or Login to view.You don't even have to look at bigrams. Take an instruction manual with one page written in British English and another in Pinyin for Mandarin. "Z" will hardly appear at all on the former but will be all over the place in the latter.  Smile

But then the same would happen with X and Y and n-grams like IAO and EONG and generally many characters or bigrams would prefer either English or Mandarin. The peculiar thing about ed is that it is the only bigram that clearly shows this behavior in the Voynich MS, as far as I understand.


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(16-02-2026, 01:07 AM)Fontanellean Wrote: You are not allowed to view links. Register or Login to view.
(16-02-2026, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I understand what you're saying about bigram statistics being driven by common words and subject matter. That absolutely happens in natural languages. Your examples make sense in that context.  But what I'm seeing here doesn’t look like normal topic drift.  I'm not just saying that “ed is frequent.”  I'm saying that a very high-frequency bigram simply disappears on many pages, and then appears heavily on others, within and across topics. That kind of page-level on/off behavior is what caught my attention.

If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.

You don't even have to look at bigrams. Take an instruction manual with one page written in British English and another in Pinyin for Mandarin. "Z" will hardly appear at all on the former but will be all over the place in the latter.  Smile

Well, I suppose that's one explanation.


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(16-02-2026, 01:07 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(16-02-2026, 01:02 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.Excellent question.  You'd think that if ed exhibits that behavior then edy, would also.  But, it doesn't.

For clarity, could you show ed trigrams on the same graph as all other trigrams, just in a different color?

Given these charts are cut off at Y~0.5, I'm not sure how they relate to other trigrams.

Sorry about that, my charts are designed to automatically compress like that.  Here's what you asked for.

   

That's the top 100 trigrams when only looking at the pages with no ed.

   

That's the top 100 trigrams when only looking at pages with an ed.

And the one you really wanted to see.

   

That's trigrams across the entire Voynich.  The ones containing ed are orange.

So, ed is essentially the driving force behind all this.  Any trigram with a large count like edy (it's the orange dot on the far right) is going to fit into that same spot on the chart as ed .  But, if the trigram counts are low, any ed trigram will get buried in the cluster with all the other trigrams.


RE: The oddities of the bigram "ed" - oshfdk - 16-02-2026

I think I remember reading that e is missing or almost missing in some folios. Could low e folios be the reason?


RE: The oddities of the bigram "ed" - Jorge_Stolfi - 16-02-2026

(16-02-2026, 12:29 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.

Your "control" examples seem to be all inherently homogeneous, in that every page it likely to use the same common words, and every bigram occurs in many of those words.  Thus it is not surprising that the plots have that shape -- which is probably predictable under the assumption that each text is produced by a single first-order Markov model.  (The "Chymical" treatise could be heterogeneous, but from the plots it does not seem to be.)

You are not allowed to view links. Register or Login to view. is an English Herbal (Culpeper's) and You are not allowed to view links. Register or Login to view. is the graph of the number of occurrences of "rk" on each "page", defined as 100 consecutive lines. Each bar is a "page".  As you can see, there are some "pages" with 9 occurrences of "rk", mostly from the word "bark" in entries about trees; and several sets of several consecutive pages with no "rk" at all, which seem to be about herbs, not trees.  Maybe it is not as dramatic as the case of ed on the VMS, but should suggest the explanation for it.

(The number of occurences above is actually the number of lines with 'rk'.  Multiple 'rk' on the same line are counted as one.)

(The source file has many @-lines which specify titles and such. I forgot to remove them, but they should not make much difference.)

What is the average size of a page in your plots?  Culpeper's herbal has on average ~3900 letters per "page". 

All the best, --stolfi


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(16-02-2026, 03:10 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.What is the average size of a page in your plots?  Culpeper's herbal has on average ~3900 letters per "page". 

All the best, --stolfi

Culpepers. I already had a normalized copy I was going to use on my website that I got from Gutenberg.  I typically strip all punctuation, lower case and then split the text into 100 word groups, something close to herbal page sizes.  For these, I strip out all Gutenberg headers and footers and then go in and remove any transcription notes, publishers notes, etc. so I'm just testing the raw text from the author.

And here's a link to that treatsie.  It's not terribly long but the use of language is period correct. You are not allowed to view links. Register or Login to view.

   

I also had a Gutenberg copy of Hypnerotomachia Poliphili from I think 1592 in my corpus to test collection, just to throw in some old English.

   


Edit: It took some digging.  The rk bigram in Culpepers isn't in the top 200.  I had to do a top 300 plot to find it.

   


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

And one more 'similar' text I had forgotten I stuffed away. A New Light of Alchymie.  It's from the 17th century and in English but I had it lying around. I've had a real hard time finding herbal or alchemical transcriptions.