The Voynich Ninja

Full Version: Transitional Probabilities and Repetitive Loops
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
A few months ago I spent a while looking at first-order transitional probabilities in the VM -- e.g., if all we know is that the current glyph is [d], what's the likelihood of the next glyph being [o]?  I You are not allowed to view links. Register or Login to view. about this at the time, but I'd like to summarize one set of observations here because they've continued to tug at my curiosity and I'm not sure how to explore them any further.

Transitional probabilities are radically different for Currier A and Currier B -- so much so that there seems to be no value in working them out for the VM as a whole.

So let's start with Currier B, and also with the glyph [d].  What's the most probable sequence of glyphs to follow from that, ignoring spaces, as if we were using an extremely crude auto-complete algorithm?

[d] --> [y] (63.71%)
[y] --> [q] (28.11%)
[q] --> [o] (97.69%)
[o] --> [k] (30.31%)
[k] --> [e] group (41.86%); probability of that [e] group then being specifically [ee] (54.28%)
[ee] --> [d] (39.79%)

In other words, the most probable path forward turns out to be a closed loop: [qokeedyqokeedyqokeedy....].  Of course, this resembles a common repetitive pattern we actually see in Currier B.  A single choice of alternate transition would typically lead to a familiar-looking "path" such as these:

[qokeey.qokeedy]
[qokaiin.okeedy]
[qokeedy.chedy]
[qolkeedy.qokeedy]
[qotedy.qokeedy]

If we try the same thing in Currier A, again starting with [d], we get:

[d] --> [a] (50.40%)
[a] --> [i] group (51.96%); probability of that [i] group then being specifically [ii] (75.52%)
[ii] --> [n] (94.80%)
[n] --> [ch] (21.25%)
[ch] --> [o] (45.67%)
[o] --> [l] (24.99%)
[l] --> [d] (21.92%)

Hence, a different closed loop: [daiincholdaiincholdaiincholdaiin.....].  But this time even some of the most probable transitions are still less than 22% probable: [n] --> [ch] and [l] --> [d].  And if we examine line-position statistics, the most probable transitions actually vary from point to point, so that there seem to be separate, non-overlapping [chol] and [daiin] regions.  Thus, a more nuanced analysis might predict [cholcholchol...daiindaiindaiin] over the course of a line, or maybe something even more varied.

Torsten Timm You are not allowed to view links. Register or Login to view. three vord "series" -- a [daiin] series, an [ol] series including [chol], and a [chedy] series including [qokeedy] -- and finds that vords tend to be more common the more closely they resemble [daiin], [ol], or [chedy].  But within each Timm series, the vord most often found repeating identically is the specific one corresponding to the inferred loop sequence:

[daiin.daiin] ×13
[chol.chol] ×23
[qokeedy.qokeedy] ×19

All of which has led me to wonder whether Voynichese might default to some sort of looping pattern whenever there's minimal "signal" present, analogous to an unmodulated carrier signal.  But I can't think of a good way to move from that vague notion to any more concrete kind of experiment, and I also worry that there's some circular logic in here somewhere.  I don't *think* the commonness of specific vords such as [qokeedy] could itself be responsible for the patterns these vords seem best to exemplify -- but if it were, I suppose that would be one way to discount this line of speculation.

I'll also admit that first-order transitional probabilities don't have very good predictive power.  I tried using them as a basis for generating random text and came up with this for Currier B (with spaces inserted wherever two adjacent glyphs most often have one):


qol.dy.dor.ol.Shey.or.Shokaiin.Shotalkar.chedy.Shy.chopcholkedy.Sheokeey.s.chcKhdy.chy.okeor.
odytey.odytodain.SheotShey.pchdy.keedy.dal.Shdy.Shetaiin.ol.ol.ody.dytchedy.qol.ol.Shekeedain.
Shedy.qol.l.chedy.dytar.olal.dy.qotey.qosal.cheokaiin.y.otchokchokeedain.cheey.y.pcholkar.Shar.
cheeotchedy.keedar.ain.cheeyty.Sheey.ol.chcKhoteey.l.dy.kaiin


That's not very good pseudo-Voynichese.  Note in particular the frequent vords containing multiple gallows.

But if we advance to second-order transitional probabilities, the [qokeedy] and [chol/daiin] loops persist, and the results of generating random text start to feel a little more plausible (to me, at least):


ol.qokeodar.ar.okaiin.Shkchedy.Shdal.qotam.ytol.dal.cheokeedy.chkal.Shedy.qokair.odain.al.ol.daiin.
cheal.qokeeey.lkain.chcPhedy.kchdy.cheey.otar.cheor.aiin.Shedy.dal.dochey.opchol.okchy.Sheoar.ol.
oeey.otcheol.dy.chShy.lkar.ain.okchedy.l.chkedy.oteedar.ShecKhey.okaiin.chor.olteodar.okal.qokeShedy.
ol.ol.Sheey.kain.cheky.chey.chol.chedy


With these randomly generated text examples, I don't mean to imply I favor a stochastic-process solution -- I'm just trying to illustrate how well or poorly a relatively simple transitional-probability model fits what we're used to Voynichese looking like.  These examples aren't based on any "word structure" model as such.  They also ignore all line-position patterning.

Apologies as always for any and all reinvented wheels.
Please forgive my ignorance. I taught myself everything I know about Markov chains from Wikipedia, in order to follow some of the good discussions that are happening these days about the statistical properties of the VMs text. My understanding of a Markov model of a dynamic system is one where the probability of any given state depends only on the state immediately prior. By "second-order transitional probability", I understand this to mean the probability of a state transition happening, given the prior two states, in the order they occurred. Do I understand this correctly?

Your idea of an "unmodulated carrier signal" has stuck with me, and I've been trying to think of a good way to model a periodically repeating system where many different paths are possible for each repetition, and each repetition's deviation from the default path can be brought into focus. Some kind of flowchart, where a line's deviation from qokeedy.qokeedy.qokeedy or daiin.chol.daiin.chol.daiin.chol can be meaningfully visualized. One of these days I'm going to start doodling on a whiteboard and come up with something. That said, this is a much easier project when the only state that matters in calculating the probability is the one immediately prior. But as your experiment above shows, taking the last two states into account produces a result that looks and feels much more like actual Voynichese.

Let's say we took the last X states into account when calculating the probability of the next state. I wonder how high a value X could have, before there were no appreciable gains in the similarity to actual Voynichese, to justify the rapid increase in calculation complexity.

A related question I'm wondering about: how many states properly comprise a complete repetition? I don't consider a space a state, as your work has clearly demonstrated that spaces supervene upon state transitions. If [ch] and [sh] are considered a single state each, then it looks like the [qokeedy] loop involves 7 state transitions per repetition, and the [daiin.chol] loop involves 8. I'm curious as to whether lines of Voynichese tend to break down neatly into multiples of 7 or 8 state transitions. I'l going to take a look at some Currier B pages as soon as I get the chance with this filter on my mind, and report back here what I find.
(15-12-2021, 12:33 AM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.Please forgive my ignorance. I taught myself everything I know about Markov chains from Wikipedia, in order to follow some of the good discussions that are happening these days about the statistical properties of the VMs text. My understanding of a Markov model of a dynamic system is one where the probability of any given state depends only on the state immediately prior. By "second-order transitional probability", I understand this to mean the probability of a state transition happening, given the prior two states, in the order they occurred. Do I understand this correctly?

That's what I meant by "second order" -- for example, the probability that [l] will occur after [do] rather than just [o].  But I'm admittedly a neophyte myself when it comes to Markov processes and the terminology behind them.

(15-12-2021, 12:33 AM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.A related question I'm wondering about: how many states properly comprise a complete repetition? 

My sense is that there may be no fixed cycle length, much as there's no fixed vord length in "word grammars."  Otherwise I'd be tempted to speculate about something like a Vigenère key with one row assigned to nulls.  It seems as though a loop cycle can be longer or shorter depending on how many steps it takes to get back to an equivalent point again.  Some seemingly interchangeable paths appear shorter than others, e.g., [qok] versus [ch].  Of course I don't know why or what it means, if there's anything to it in the first place.
The most frequent transitions from particular glyphs to other individual glyphs are very different in Currier A and Currier B.  I'm not sure whether this way of looking at differences between the two "languages" has any advantages over (say) the study of bigram frequencies.  But it's at least a different way.

So, for example, it's common to observe that [ed] is common in Currier B but almost nonexistent in Currier A.  With transitional probability statistics (limited to "paragraphic" text and ignoring spaces), this comes out instead as:

Currier A: e>o (51.92%), e>y (28.65%)....
Currier B: e>d (47.36%); e>y (19.85%)....

That is, in Currier A, any given token of [e] (as distinct from [ee], [eee], etc.) has a 51.92% probability of being followed by [o], while in Currier B, it has a 47.36% probability of being followed by [d]; and in both cases, the next-most-probable following glyph other than the "most favored" one is [y].  There are plenty of statistical patterns that these single glyph-to-glyph transitions won't catch, of course, but that's also true of bigram frequency statistics, so I figure these are worth a try.

One question that sometimes comes up is whether Currier A "evolved into" Currier B, and I was curious to see whether calculating transitional probability matrices for individual bifolios (limited again to "paragraphic" text) would suggest any particular sequence of transitional phases.

The most common transitions from [Sh], [d], and [e] seemed to be most reliable for distinguishing A bifolios from B bifolios overall, but some A bifolios diverged further from the typical B profile than others, consistent with a "gradual introduction" of new features.  With that idea in mind, here's a tentative evolutionary sequence ("e+" means any quantity of [e]):

Sh>o / d>a / e>y: 1+8, 2+7, 9+16, 11+14, 13, 18+23, 25+32, 27+30, 35+38, 36+37, 42+47, 44+45
Sh>o / d>a / e>o tied with e>y: 10+15, 49+56
Sh>o / d>a / e>o: 3+6, 4+5, 17+24, 19+22, 20+21, 28+29, 51+54, 52+53, 93+96
Sh>e+ / d>a / e>o: 58+65, 87+90, 88+89, 100+101
Sh>e+ / d>y / e>o: 57+66, 99+102
Sh>e+ / d>y / e>d (standard “Currier B”): everything else

I didn't find bifolios with combinations outside this sequence -- for example, a bifolio in which [e] was most often followed by [y] and [Sh] was most often followed by [e+].  So, based just on this possibly sketchy evidence, the most likely sequence of "evolutionary" steps would seem to be:

(1) e>y is overtaken by e>o, after briefly tying with it
(2) Sh>o is overtaken by Sh>e+
(3) d>a is overtaken by d>y
(4) e>o is overtaken by e>d

I haven't tried calculating probabilities separately for different parts of bifolios to see if these statistics are consistent across them.  That might be interesting, although the reduced size of the dataset would amplify the statistical noise.

Some other distinctive "most favored" transitions consistently match one of the above combinations.  For example, l>k goes with the combination Sh>e+/d>y/e>d.

Other distinctions nearly follow the same sequence, with exceptions clustering around the "borderline" or "transitional" or "intermediary" combinations Sh>o / d>a / e>o and Sh>e+ / d>a / e>o.  Thus, for k> and t>:
  • k>o, k>ch, t>o, t>ch are all with Sh>o (only exception: tie of t>o with t>e+ on 88+89, in Sh>e+ /d>a /e>o)
  • k>a and t>a are with Sh>e+ (only exception = t>a on 51+54, in Sh>o / d>a / e>o)
  • t>e+ is with Sh>e+ (only exception: 93+96, in Sh>o / d>a / e>o)

I don't have high hopes of this leading anywhere, but the post has been sitting in my "drafts" folder for long enough that I figured it was time to let it loose.