The Voynich Ninja
The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: The oddities of the bigram "ed" pt. 3 : It's not just "ed" (/thread-5384.html)

Pages: 1 2


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026

(19-02-2026, 01:10 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.So shedy eats shol?

Regarding the shuffling of pages, this is understood to have happened afterwards, possibly by someone who did not know everything about the MS. Accidentally bound out of order seems more likely than intentionally shuffled.

Edit to add: I cannot understand why the study of glyphs is being discouraged. The manuscript is made of glyphs. Study them and you may understand it better.

From what I've heard, yes, folios are out of order and Lisa Fagan Davis is working on that.  I have not seen anyone produce evidence of shuffling as a result of the rebinding.  At the top of each page is a number that was obviously added later.  Again, whether that was to keep track of binding order or not, is anyone's guess.

All I can do is produce the numbers and make a good guess.  My guess, with the other evidence, obfuscation.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026

(19-02-2026, 10:23 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.You are correct.  Two random tuples would produce exactly that.  But you forgot the boolean.  That chart is not measuring 2 things, it's measuring 3.  So, taking your same tuple math with a random boolean attached, I get this.

That is not what my chart shows.

But why attach a random boolean? As far as I understand, the boolean in your chart is not random, but some metric that depends on the variables.

Quote:The 0ed half, which included pages where it never occurred or it occurred once in hapax token, and the ed+ "half" where it occurred at least once and was not in a hapax token.

So, it appears this is a simple cutoff filter on one of the variables? What would the random tuples chart show if the background color was selected as A < 5, for example?


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026

(19-02-2026, 01:27 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(19-02-2026, 01:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.When solving a cipher (and by extension any textual mystery) you need to attack the odd things that stand out. There is the weakness.

I don't think I can name many things about the Voynich MS that don't stand out. Who knows, maybe it's not an elephant, but a porcupine, focusing on the longest quills may not get us closer to the truth.

In all the tests I've run and all the charts I've created, the one thing that every single one has told me is, what the Voynich isn't.  Digging into the Voynich is nothing but a series of repeated failures.  And the minute you do think you do see something, like my last 3 posts, someone will point out that it's something else.  

So I don't know if it's an elephant or a porcupine.  But, if I look at those quills, I may not be able to identify it as a porcupine, but I'll be pretty certain it's not an elephant.  Only by eliminating the impossible can you then see what's possible.  Now if I'm looking at the longest quills, then I know it's a porcupine and I'm trying to identify the species.  So far, nobody can even agree on if it's any animal we know.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026

(19-02-2026, 10:42 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.So I don't know if it's an elephant or a porcupine.  But, if I look at those quills, I may not be able to identify it as a porcupine, but I'll be pretty certain it's not an elephant.  Only by eliminating the impossible can you then see what's possible.  Now if I'm looking at the longest quills, then I know it's a porcupine and I'm trying to identify the species.  So far, nobody can even agree on if it's any animal we know.

I'm not sure any single thing has been ever decisively eliminated about the manuscript, except that its creation started past 1360s (or what was the earliest C14 match, I don't remember), and even then some people argue that the MS is a copy of a work created earlier.

For example, simple 1:1 substitution when interpreting glyphs according to popular transliterations (EVA, etc) for a common European language (Latin, French, German) has been eliminated. However, simple substitution with an exotic language or simple substitution but with different glyph assignments (edit: that is, interpreting variations of some EVA characters are different glyphs) or multi character substitution have not been eliminated, as far as I know.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026

(19-02-2026, 10:37 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.But why attach a random boolean? As far as I understand, the boolean in your chart is not random, but some metric that depends on the variables.

Exactly! That’s the whole point.  The simulation chart I showed you uses a random boolean to demonstrate what happens when the background is independent of the variables being sorted.  In the Voynich chart, the boolean (0ed vs ed+) is not random, it is dependent, and that is precisely why the clustering does exist.

(19-02-2026, 10:37 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.So, it appears this is a simple cutoff filter on one of the variables? What would the random tuples chart show if the background color was selected as A < 5, for example?

That would work, but then you'd be adding a 4th number to the calculation, a constant to act as a filter.  My chart only looks at 3 things, if the page has (0 ed or 1 ed in a hapax) or (ed>=2 or in a non hapax)  that's the boolean.  The normalized count of ho and the count of ed per page are the two variables.  There is no numeric cutoff applied to ho or ed density.  There is no threshold on the plotted variables.  There is no constant added as a filter.  Again, I'm no mathematician, but if you can find some functional dependency between a random boolean and the sort key, I'm all ears.  That would explain your hypothesis.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026

(19-02-2026, 11:14 PM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.My chart only looks at 3 things, if the page has (0 ed or 1 ed in a hapax) or (ed>2 or in a non hapax)  that's the boolean.  The normalized count of ho and the count of ed per page are the two variables.  There is no numeric cutoff applied to ho or ed density.  There is no threshold on the plotted variables.  There is no constant added as a filter.  Again, I'm no mathematician, but if you can find some functional dependency between a random boolean and the sort key, I'm all ears.  That would explain your hypothesis.

The sorting key is (ed/ho), the boolean is ed < 2 (or <= 2, from your comment it's not clear what happens if ed equals 2). To me this looks like they are obviously dependent in a straightforward way. If ed < 2 the value of ed/ho will be low for most values of ho.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026

(19-02-2026, 10:59 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.The sorting key is (ed/ho), the boolean is ed < 2 (or <= 2, from your comment it's not clear what happens if ed equals 2). To me this looks like they are obviously dependent in a straightforward way. If ed < 2 the value of ed/ho will be low for most values of ho.

My mistake, I left out the =.  Sorting is.  0 ed bigrams on a page or 1 ed bigram in a global hapax.  1 ed on a page not in a global hapax or >= 2 ed on a page.


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Jorge_Stolfi - 20-02-2026

(19-02-2026, 12:57 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.While I don't disagree, I think that it works in the opposite direction. The observation that ed (in fact e) does not appear at the start of words suggests that e is something attached to the preceding character.

Definitely, the various word models, including mine, were deduced from bigram frequencies.

But those models already incorporate a lot of information about bigrams, and much more.  So simple bigram statistics on one hand may be partially wasted word, because it will discover things that the models already say (like "ed cannot appear at the start of a word"), and/or provide information that cannot be easily interpreted.  If one adopts my list of elements, it would be more useful to compute the distribution of the elements that can precede d in each language.  

IIUC, in language B the {d} element can be preceded by some or all of E = {che} {she} {ee} {eee} or F = {ke} {te} {ckhe} {cthe} (forget the puffs and other rare combos for now).   Which ones, exactly?  (Wild guess: ee is much less common than eee in that context.)

But it seems none of the E and F elements can occur before {d} in language A.   Can we say that some of the elements of E or F that precede {d} in language B get replaced by other specific elements in language A?

Quote:Word statistics are also useful, but we do not know that Voynich word represent plain text words

Even if they are encrypted, the Zipf plots suggest that the encryption is mostly one-to-one on words.  So word statistics and word-level dependency analysis could help us understand the "grammar" (or absence thereof) of the underlying language.

Quote:word statistics are much more affected by spelling and spacing uncertainties.

Yes, when analyzing word statistics one must always keep in mind that possibly 10% or more of the words in any transcription file may be wrong.  

(But glyphs may be wrong too --- I bet that 10% or more of the transcribed r were in fact s, 10% of the o were a, 10% of the ee should have been Ch, etc. Or vice versa.) 

All the best, --stolfi


RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Jorge_Stolfi - 20-02-2026

(19-02-2026, 01:10 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Edit to add: I cannot understand why the study of glyphs is being discouraged. The manuscript is made of glyphs. Study them and you may understand it better.

See my reply to Rene.  Statistics of glyphs and digraphs is definitely the first thing one should do when faced with a new cipher or "cipher". But they cannot tell much.  To make real advances one must make some non-trivial hypotheses about the plaintext and/or the encoding, and then devise statistical tests that could confirm or refute them.

All the best, --stolfi