The Voynich Ninja
The oddities of the bigram "ed" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The oddities of the bigram "ed" (/thread-5368.html)

Pages: 1 2 3 4 5


The oddities of the bigram "ed" - Dunsel - 15-02-2026

Ok, so I have this paper I've been working on and I have a very rough draft on Zenodo.  I've decided to put the things I've been digging into on ninja with the hopes that additional pairs of eyes will clue me in to things I've been missing before I make a complete fool of myself and submit it for peer review.  For all of these tests, I've used the EVA Takahashi (I'm old and have used it for years) with cross verification of the EVA Zandberg/Landini.

I'm goin to try to break all of this down into multiple posts because I have a bunch of territory to cover.  Each will refer back to previous ones.  Much of what I'll cover won't be new territory to the old hands at the Voynich. Some, may be.


The bigram "ed"

It's been known for many years that the bigram "ed" is just plain odd. It occurs in the Voynich as a midfix 4,474 times and as a suffix 186 times.  Never as a prefix.  That may not sound that striking but this chart shows just how striking it is.

   

That is "ed" compared to the top 100 bigrams by total count and percentage of pages.  It occurs on roughly 56% of pages but is in the top 10 for total bigram count (#9).


Currier and "ed"

Currier noticed a difference when he described his language A and language B.  He could never quite put his finger on all of the differences between the two.  I'll suggest that the big thing he noticed was the bigram "ed".

   

This chart shows the locations of the bigram "ed" with the background shaded to represent Currier A and Currier B.  The dot colors represent blue = no ed bigrams on the page, orange = 1 ed bigram on the page and green 2+ bigrams on the page.

Side note: You'll notice 2 orange dots early in the herbal section  You are not allowed to view links. Register or Login to view. and f11r.  In both of those pages, ed occurs once and it's inside a hapax token.  The total number of pages where ed only occurs once is 19.  Of those 19 pages, it's a hapax token on 6.  


So, just from comparing Currier to ed, we see there's a very close match.  He apparently never fully defined the zodiac section as either so it has the white background.


"ed" by section

The first thing I noticed was, the first 25 folios only have those 2 occurrences of ed.  That seemed pretty odd for a bigram that's one of the top 10 by count.  So, I decided to dig further.

    "

This chart shows the bigram ed by section.  I lumped the ed's into buckets.  No ed on the page, 1 ed per page and a low, medium and high bucket that split the ed per-folio count into 3 groups of around 40 pages each. This chart is also normalized by folio word count to show the differences even better than the previous chart.  On the left, you'll notice again, the first 25 folios, only the two hapax token ed occurrences.  At f26r, ed gets introduced.  But not all at once.  It skips around between pages with ed and no ed.  The pharma section does the same thing.  Some have ed, some do not. The same for zodiac.  About half either have no ed or 1 ed.  Baneo, rosette and recipes all have the highest count and ratio of ed in the entire Voynich.  


"ed" by sheet?

Now here's where things get a big strange.  I'm not going to interject my theory here. I'm going to be really interested in hearing yours.

   

I downloaded the quire diagram from Voynich.nu and converted it into a csv that I could import into my python.  I then changed the background color to match the quire sheet number.  With one exception, 27v, you will notice that all of the pages where the bigram ed is the highest in the herbal section, they're all on the same sheet.  But, they're intermixed with sheets that contain no ed.

F26 and F31 are on sheet 2
F33 and F41 are on sheet 1
F34 and F40 are on sheet 2
F41 and F48 are on sheet 1
F43 and F46 are on sheet 3
F50 and F55 are on sheet 1

You can also see a similar pattern in pharma. All have a relatively low ed count with those in the middle having a higher normalized count.  Those appear on sheets marked as sheet 1.  Again, no theory, but if the Voynich is in some semblance of a chronological order, this, combined with the no ed pages in other sections made me seriously scratch my head.


Which came first?

One thing Dr. Davis has mentioned in some of her talks is that she believes the folios are not in original order (I can't wait to see the results of that!).  And looking at these charts, it struck me as interesting that the ed bigram appears to be in clusters and groups.  Not so much by region as by quire sheet.  Since we truly have no idea what order this book was written in, I developed a theory.  Assume that all of the pages where ed never occurred or was in a hapax token where created first and that the bigram ed was brought into prominence later (or the reverse of that).  What kind of differences would they have?   So, I spit the Voynich into 2 "halves".  The 0ed half, which included pages where it never occurred or it occurred once in hapax token, and the ed+ "half" where it occurred at least once and was not in a hapax token.

Here's a csv list of the pages I identified and began classifying as 0ed and ed+ pages.

0ed

f1r,f1v,f2r,f2v,f3r,f3v,f4r,f4v,f5r,f5v,f6r,f6v,f7r,f7v,f8r,f8v,f9r,f9v,f10r,f10v,f11r,f11v,f13r,f13v,f14r,f14v,f15r,f15v,f16r,f16v,f17r,f17v,f18r,f18v,f19r,f19v,f20r,f20v,f21r,f21v,f22r,f22v,f23r,f23v,f24r,f24v,f25r,f25v,f27r,f28r,f28v,f29r,f29v,f30r,f30v,f32v,f35r,f35v,f36r,f36v,f37r,f37v,f38r,f38v,f42r,f42v,f44r,f44v,f45r,f45v,f47r,f47v,f49r,f49v,f51v,f52v,f53r,f53v,f54r,f54v,f56r,f56v,f65r,f67v2,f68r2,f71v,f72r2,f87v,f88r,f89v1,f90r2,f90v1,f90v2,f93r,f93v,f96r,f96v,f99r,f100r,f100v,f101r,f102r2,f102v1,f102v2


ed+


f26r,f26v,f27v,f31r,f31v,f32r,f33r,f33v,f34r,f34v,f39r,f39v,f40r,f40v,f41r,f41v,f43r,f43v,f46r,f46v,f48r,f48v,f50r,f50v,f51r,f52r,f55r,f55v,f57r,f57v,f58r,f58v,f59r,f59v,f60r,f60v,f61r,f61v,f62r,f62v,f63r,f63v,f64r,f64v,f66r,f66v,f67r1,f67r2,f67v1,f68r1,f68r3,f68v1,f68v2,f68v3,f69r,f69v,f70r1,f70r2,f70v1,f70v2,f71r,f72r1,f72r3,f72v1,f72v2,f72v3,f73r,f73v,f74r,f74v,f75r,f75v,f76r,f76v,f77r,f77v,f78r,f78v,f79r,f79v,f80r,f80v,f81r,f81v,f82r,f82v,f83r,f83v,f84r,f84v,f85r1,f85r2,f86v3,f86v4,f86v5,f86v6,f87r,f88v,f89r1,f89r2,f89v2,f90r1,f91r,f91v,f92r,f92v,f94r,f94v,f95r1,f95r2,f95v1,f95v2,f97r,f97v,f98r,f98v,f99v,f101v,f102r1,f103r,f103v,f104r,f104v,f105r,f105v,f106r,f106v,f107r,f107v,f108r,f108v,f111r,f111v,f112r,f112v,f113r,f113v,f114r,f114v,f115r,f115v,f116r

So, this is how I entered the rabbit hole.  There's a bit to digest here when you consider the implications so I'll end the post here.  But, there's also lots more to pile on top of this so I'll be referring back to this post.  I'll be sure to link it when I continue this in a new thread in the near future™.

Thanks for looking it over and I'm eager to hear opinions.


RE: The oddities of the bigram "ed" - ReneZ - 15-02-2026

I like your first graph, which is very striking.
Just a minor question or perhaps a nitpick: what exactly do you mean with:

Quote:That is "ed" compared to the top 100 bigrams by total count and percentage of pages.

These would be two different lists. Is it the superset of these two lists, or rather the cross-section?

With respect to Currier, in his paper he has a list of points that help to distinguish his A vs. B languages. While 'ed' only appears in B language, he never mentions this bigram as a discriminator.

Have you checked this page: You are not allowed to view links. Register or Login to view. ?
There is some overlap with what you are showing.

Also, in your first graph, there are some points near the top (close to 100% of page coverage) but with a relatively low total count. These may be of interest as well.

Finally, bigram statistics depend heavily on the transliteration alphabet. It would be worth using something different than Eva, not because it would be better or worse, but because it may bring a different (additional) perspective. This will not affect the specific bigram 'ed' too much though, so if that is your main focus, it is not too important.


RE: The oddities of the bigram "ed" - Jorge_Stolfi - 15-02-2026

(15-02-2026, 03:04 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.It's been known for many years that the bigram "ed" is just plain odd. It occurs in the Voynich as a midfix 4,474 times and as a suffix 186 times.  Never as a prefix.  That may not sound that striking but this chart shows just how striking it is.

In my structural model for Voynichese words, the e glyph either occurs as a pair ee, which works as a single "letter" in the same class as Ch and Sh (and may be just a handwriting variant of Ch) or singly as a modifier for a preceding Ch, Sh, ee, k, t (not p or f), or a platform gallows (CTh, CKh, CPh, CFh).

A consequence of this model of the "true alphabet" is that ed cannot appear at the beginning of a word.  It can appear in words like keedy (parsed as {k}{ee}{d}{y}), chedaiin (parsed as {che}{d}{a}{iin}), yCThedam (parsed as {y}{CThe}{d}{a}{m}) etc.

But I must insist that statistics of characters and digraphs are bound to be more confusing than illuminating.  Their frequencies are mostly determined by whether they occur in the most common words; and these in turn may be highly dependent on the topic.

For example, the digraph "rb" may be significantly more common in a Latin herbal text than in an astronomical text, because it occurs in the word "herba".  

The digraph "ed" may be more frequent in an English chronicle than in an English herbal, because of its occurrence in verbs inflected in the past tense.

And the "th" digraph may be less common at the beginning of a line in any English text, because its frequency there is determined by the occurrence of the common words that begin with it: "the", "this", "that", "then", "they", "them", "there", "thus", etc -- but those common words are relatively short, and when text is formatted into paragraphs the first word of each line tends to be longer than average, while short words are more likely to fit at the end of each line.  

Thus statistics of words and word pairs are usually more illuminating than statistics of characters or digraphs.

All the best, --stolfi


RE: The oddities of the bigram "ed" - ReneZ - 15-02-2026

Well, bigram statistics show the behaviour of bigrams.

Comparisons with English (or Latin etc) may not be relevant because the Voynich MS text behaves in a very different way from these natural languages.

Their frequency may depend on the subject matter but it remains to be seen to what extent.

Should the Herbal-A and Herbal-B pages be about the same subject matter?
Their bigram statistics are wildly different, so we have a strong statistical observation that requires an explanation.

Bigram statistics are not the only thing. They are a piece of the puzzle.


RE: The oddities of the bigram "ed" - eggyk - 15-02-2026

(15-02-2026, 08:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But I must insist that statistics of characters and digraphs are bound to be more confusing than illuminating.  Their frequencies are mostly determined by whether they occur in the most common words; and these in turn may be highly dependent on the topic.

For example, the digraph "rb" may be significantly more common in a Latin herbal text than in an astronomical text, because it occurs in the word "herba".  

The digraph "ed" may be more frequent in an English chronicle than in an English herbal, because of its occurrence in verbs inflected in the past tense.

And the "th" digraph may be less common at the beginning of a line in any English text, because its frequency there is determined by the occurrence of the common words that begin with it: "the", "this", "that", "then", "they", "them", "there", "thus", etc -- but those common words are relatively short, and when text is formatted into paragraphs the first word of each line tends to be longer than average, while short words are more likely to fit at the end of each line.  

All the best, --stolfi

Yes, but in this graph its not the matter of relative frequency alone, but rather that there are many pages with 0 instances. This is highly unusual for what seems to be a very common bigram elsewhere in the text.  

Although, your "rb" example is interesting. Perhaps it would be worthwhile looking at "conditionally common" bigrams found within other manuscripts, using a similar analysis done in this post. I would imagine such bigrams would largely be consonant pairs.


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(15-02-2026, 04:09 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I like your first graph, which is very striking.
Just a minor question or perhaps a nitpick: what exactly do you mean with:

Quote:That is "ed" compared to the top 100 bigrams by total count and percentage of pages.

These would be two different lists. Is it the superset of these two lists, or rather the cross-section?

With respect to Currier, in his paper he has a list of points that help to distinguish his A vs. B languages. While 'ed' only appears in B language, he never mentions this bigram as a discriminator.

Have you checked this page: You are not allowed to view links. Register or Login to view. ?
There is some overlap with what you are showing.

Also, in your first graph, there are some points near the top (close to 100% of page coverage) but with a relatively low total count. These may be of interest as well.

Finally, bigram statistics depend heavily on the transliteration alphabet. It would be worth using something different than Eva, not because it would be better or worse, but because it may bring a different (additional) perspective. This will not affect the specific bigram 'ed' too much though, so if that is your main focus, it is not too important.

First, thank you for the reply. Getting a chance to speak to the Voynich legends on here constantly humbles me.

My labels may not have been clear. What I did was count the number of times ed occurs anywhere in the Voynich (x axis) and count the number of pages it appears on (y axis) and show that as a percentage.  It's compared to the top 100 bigrams using the same measurements.  I refined my charts below so it's normalized rather than by percentage.

I have studied some of your findings in the past but for the purpose of these tests I tried to go in pretending ignorance without any assumptions and see where the numbers led me.  I knew of the existence of what Currier found but I specifically avoided knowing exactly which pages were defined. I didn't overlay his work until I identified the ed "expansion."  I believe, on my next post when I show some of the tests I've conducted, that while there are a number of differences on those pages, that language A and B can be explained as as the same system being used in the pages without ed with just a few 'enhancements'.

As for other bigrams, yes I have looked at some. And here's an interesting tidbit.  The letter o.  Very early, o has 12 other letters that can be added to the end of it to make a bigram.  By the time you get to the end of the 0ed pages I identified, it has 17 of the 19 possible. (I'm only counting 20 glyphs, no weirdos, and excluding v as it appears on exactly one page as both a prefix and a suffix, I've yet to really dig into it.)  And, I think that's part of what explains the ed explosion. I'll get more into detail about that in a later post.

To answer your question regarding the transliteration, ed is not the main focus.  Finding it was more the keystone event that led me down a rabbit hole and is significant in it's ability to define the two distinct sections that Currier identified some 50 years ago.  But, your questions got my curiosity up as well.  I had checked against your transcription, but no other comparison.  So here they are.


   

Here's ed from your transcription.

   

This is from the FSG which I downloaded from your site.

   

Glen Claston 1b

   

Bram Stoker's Dracula 

   

And Julius Caesar's Cicero for comparison

Thank you again for making me think a bit harder about this!


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(15-02-2026, 09:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Well, bigram statistics show the behaviour of bigrams.

Comparisons with English (or Latin etc) may not be relevant because the Voynich MS text behaves in a very different way from these natural languages.

Their frequency may depend on the subject matter but it remains to be seen to what extent.

Should the Herbal-A and Herbal-B pages be about the same subject matter?
Their bigram statistics are wildly different, so we have a strong statistical observation that requires an explanation.

Bigram statistics are not the only thing. They are a piece of the puzzle.

Again, making me think. 

A Chymicall treatise of the Ancient and highly illuminated Philosopher, Devine and Physitian, Arnoldus de Nova Villa

   


RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026

(15-02-2026, 08:22 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thus statistics of words and word pairs are usually more illuminating than statistics of characters or digraphs.

All the best, --stolfi

I understand what you're saying about bigram statistics being driven by common words and subject matter. That absolutely happens in natural languages. Your examples make sense in that context.  But what I'm seeing here doesn’t look like normal topic drift.  I'm not just saying that “ed is frequent.”  I'm saying that a very high-frequency bigram simply disappears on many pages, and then appears heavily on others, within and across topics. That kind of page-level on/off behavior is what caught my attention.

If there’s a natural-language example where a very high-frequency digraph shows this kind of behavior, I’d genuinely be interested to see it. It would help clarify whether what we’re seeing here is unusual or not.

And yes, I agree that word-level analysis is important. That’s exactly what I’ll be looking at in the next post. This one was just meant to introduce the pattern and explain why I split the Voynich into two regimes for further testing.


RE: The oddities of the bigram "ed" - oshfdk - 16-02-2026

If VMS was a cipher and different cipher tables (keys, mappings) were used for different folios, this behavior of ed would be easy to explain. Say, for some folios ed maps to plaintext ST, for other folios ed maps to plaintext GP, very different statistics. However, even under this scenario it's hard to explain why it's only ed that behaves like this.

Is it possible to probe trigrams/tetragrams and see whether any particular longer ngram that contains ed causes this effect?


RE: The oddities of the bigram "ed" - Rafal - 16-02-2026

It is easy to explain if we assume that the text is gibberish. One scribe liked to use it while other didnt. And Currier languages correspond with supposed different hands. That would be all.
Meaningful explanation is harder. Writing in another plain language would probably change more text properties, not only this