The Voynich Ninja

Full Version: Rightward and Downward in the Voynich Manuscript - Patrick Feaster
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
(04-12-2022, 12:41 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Patrick, I love that we can bring different phenomenon together and consider how they combine. Do we know the directional cause of overlap vs under/overrepresentation? Can we exclude the influence of the final glyph of a word from impacting the distribution of the initial glyph?

I suppose causality could run in either direction: positional distribution patterns with differing amounts of overlap could cause under/overrepresentation, or under/overrepresentation could cause positional distribution patterns with differing amounts of overlap.  Or there may be cases in which word-break combinations are still significantly skewed even *after* controlling for degrees of overlap.  This one example of [y.q] versus [n.q], even if it holds up to scrutiny, is still just one example.

But of course neither the overlaps in positional distribution patterns nor the weak-strong word-break combination attractions and repulsions constitute an explanation in themselves -- even if one "leads to" the other, that still leaves open the question of what's causing the "leading" phenomenon to occur in the first place.

The kinds of experiment you're hinting at seem very interesting and worthwhile, and I'd love to read about any conclusions you're able to draw from them.

With regard to the positional distribution patterns as a whole, the dynamics behind them are still very unclear (at least to me).  Are certain glyphs and words really "tied" to particular positions in lines?  Or does the process of text creation have a cumulative aspect over the course of a line, such that certain glyphs and words become progressively more or less probable over time because of what has preceded them?

An example of the first type would be the Trithemian Polygraphia III cipher, used "as intended," where words representing plaintext characters are selected from successive columns to fill out a cycle (which could be one line), and with each column containing words that share a distinctive morphological profile.  If the fifth column consists entirely of words that begin with [p], for example, then words beginning with [p] will consistently appear as the fifth words of cycles/lines.  I'm thinking of Jürgen Hermes (hermesj)'s sample encipherment of the word "secret" from the conference: [abril madu badir cadeler pasu ador].

An example of the second type would be an additive cipher in which each plaintext letter is assigned a number and the ciphertext is a running cumulative sum expressed in Roman numerals.  Thus, ABRACADABRA -- in which the characters are individually 1;2;18;1;3;1;4;1;2;18;1 -- would be encoded numerically as 1;3;21;22;25;26;30;31;33;51;52, and written [i iii xxi xxii xxv xxvi xxx xxxi xxxiii li lii].  Here it's not that [v] or [x] or [l] "prefers" a later position in the line as such; it's just that it takes time for the preceding text to "build up" to those values.

I've chosen a pair of ciphers as examples just because they're relatively easy to describe and contrast with each other -- I don't mean to imply that the patterns necessarily point to a cipher rather than some other kind of solution.  And of course either of these two specific ciphers, as described, would produce more rigid patterning than what we see in the Voynich Manuscript.  But they may be useful as models if we want to try to design a test that could distinguish between processes of these two kinds.
(02-12-2022, 08:27 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Thus we could potentially explain rightwardness in this way: words which appear more to the left have the "ability" to make lines longer or words which appear more to the right have the "ability" to make lines shorter. 

I'm not sure if this is quite the sort of experiment you have in mind, but I just tried calculating the average line length (in words) for tokens of some of the word pair categories where one type tends to score rightward of the other.

Mostly the results ended up looking like insignificant statistical noise.  But there was one case that looks as though it might fit your hypothesis somewhat consistently.

In Currier A, the average length of a line containing a word that contains [k] is 7.27 words, while the average length of a line containing a word that contains [t] is 7.04 words.  Nine of the ten specific word pairs of this kind with the highest token counts follow this same pattern, with the [k] word tending to appear in longer lines than the [t] word:

[cThy] 6.79, [cKhy] 7.29
[qotchy] 6.13, [qokchy] 6.53
[oty] 7, [oky] 7.74
[cThol] 6.45, [cKhol] 8.11
[otol] 7.69, [okol] 8.11
[otaiin] 7.27, [okaiin] 7.38
[qoty] 7.4, [qoky] 7.10 (the exception)
[qotol] 6.63, [qokol] 8.37
[cThor] 6.85, [cKhor] 7.88
[cThey] 7.48, [cKhey] 7.85

The [t] words appear more to the right and are associated with shorter lines, while the [k] words appear more to the left and are associated with longer lines.

But this pattern doesn't extend to Currier B.  And neither of my other two "sample" sets of contrastive graphemic minimal word pairs shows a similar pattern.  That's not to say there aren't any potentially significant-looking differences.  For example, in Currier B, the words starting [Sh] are in lines averaging 8.97 words in length, while the words starting [ch] are in lines averaging 9.32 words in length, so now the more rightward type [ch] seems to correlate with longer lines -- but this time the most common individual word pairs don't show any coherent-looking pattern.
One year ago, I posted a similar diagram in You are not allowed to view links. Register or Login to view.. This plot compares words containing 's' and 't' in the lines of Shakespeare's sonnets.
(EDIT: I updated this diagram to make it comparable with the paragraph diagram below)

[attachment=7032]

As can be seen, 't' has a rather flat distribution, decreasing at the end of a line, while 's' increases from left to right the end. This could be due to the fact that lines tend to be sentences, and words like 'the','then' and several pronouns ('it', 'they', 'thou') often appear at the beginning of a sentence. The increase for 's' could be due to the suffixes for third person verbs and plural nouns (both of which can appear at the end of sentences).

Cherry-picked examples to illustrate what I mean:

then being asked where all thy beauty lies
the age to come would say this poet lies
they do but sweetly chide thee who confounds
with virtuous wish would bear you living flowers


Treating each sonnet as a paragraph, there could also be some vertical patterns, at least in the last lines:
[attachment=7030]

This is a third example (in addition to the ciphers described by Patrick You are not allowed to view links. Register or Login to view.) of something that can cause rightwardness patterns. Of course there are strong reasons that suggest that what we see in Shakespeare's sonnets is different from what we see in Voynichese, but I think it's useful to consider all systems that can cause similar effects: at least, it would be great to find experimental ways to tell one from the others.



About a different side of the topic, I ran a quick and dirty test using Q13 from the You are not allowed to view links. Register or Login to view. that include coordinates for each single words. These are the average X positions for words containing a few patterns. Unless I made errors, this shows that the second member of each couple tends to appear slightly closer to the right margin of the page:

sh 541.4
ch 580.2

ol 595.3
al 638.4

ok 587.9
ot 614.6
I ran a couple of less rough experiments with the You are not allowed to view links. Register or Login to view.. For each word, a set of coordinates is given, including an X value corresponding to its leftmost pixel. I divided each page into 10 vertical stripes of variable width so that each stripe contains 10% of the words. The results appear to be very close to Patrick's system (his original plots for You are not allowed to view links. Register or Login to view.- You are not allowed to view links. Register or Login to view.l).
Just to as a note, Sean Palmers Voynich MS glyph position stacks might be some useful for this topic

You are not allowed to view links. Register or Login to view.
(06-12-2022, 09:41 PM)Scarecrow Wrote: You are not allowed to view links. Register or Login to view.Just to as a note, Sean Palmers Voynich MS glyph position stacks might be some useful for this topic

You are not allowed to view links. Register or Login to view.

That page clearly illustrates the rigid word structure of Voynichese. It is possible to draw similar plots where cells represent word statistics (rather than character statistics) and each row represents a set of lines with a certain number of words (rather than a set of words with a certain number of characters). I limited line lengths to the 7-12 range in order to have a sufficient number of samples.
Palmer's plots show that there are couples of similar characters that behave quite similarly in relation to word structure. Patrick's findings suggest that similar characters tend to be complementary in relation to line structure: when one is more frequent, the other is rarer, as if they were selected on the basis of the position of the word inside a line.

These are plots for 'k' and 't', where Palmer's plots have been negated so that 0 corresponds to black and 1 to bright red (k) or green (t); contrast has also been increased. The plots for 'k' and 't' look similar and, if they are combined as RGB channels of a single image (at the bottom), the result mostly consists of lighter or darker yellow tones, with little or no green or red.
Line plots (on the right) look very different from each other and combining them mostly results in reddish / greenish areas corresponding to the prevalence of 'k' or 't'. In particular, 'k' is more prevalent just before mid-line, while 't' is more frequent at line-start and near line-end, resulting in two greenish areas at the left and right.

[attachment=7039]

The same can be done for 'ch' and 'sh'. The v101 transliteration used by Palmer includes different codes for sh with different plumes: since 2 appears to be the most frequent, I only considered that. Again, in Palmer's plots, 'ch' and 'sh' are similar and the combined plot shows much yellow and little green or red. On the contrary, line plots show two coloured areas: greenish at the left, where 'sh' is particularly frequent, reddish at the center, where 'ch' prevails.

[attachment=7040]
(08-12-2022, 04:07 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.The plots for 'k' and 't' look similar and, if they are combined as RGB channels of a single image (at the bottom), the result mostly consists of lighter or darker yellow tones, with little or no green or red.  Line plots (on the right) look very different from each other and combining them mostly results in reddish / greenish areas corresponding to the prevalence of 'k' or 't'.

This new type of multicolor display seems really effective.  Your examples inspired me to adapt one of my existing scripts to generate something similar, although it arranges line lengths in the opposite direction (shorter lines at the bottom, longer lines at the top) and the height of rows varies to keep the area per word position consistent.

I thought it might be interesting to see how the position of the character within the word affected distributions within lines.  Here are three different plots for [Sh] in green versus [ch] in red, where the first plot is limited to word-initial [Sh] and [ch] and the second plot to word-internal [Sh] and [ch], while the third plot shows both word-initial and word-internal cases combined.

[attachment=7044]

The word-initial (left) and word-internal (middle) plots look strikingly complementary, which probably supports the idea that line-start words containing [Sh] and [ch] are somehow "equivalent" to mid-line words that start with [Sh] and [ch].  But it also looks as though the first and second words of lines tend to be yellower (indicating a higher [Sh] to [ch] ratio) for longer lines than for shorter ones.

Here's a similar set of three plots for [k] in green versus [t] in red.

[attachment=7048]
 
The overall prominence of [t] along the left edge appears to be due primarily to word-initial [t], and also (although these plots don't show it) mainly to paragraph-initial [t].  Meanwhile, the gradual "reddening" from left to right seems to be due almost entirely to word-internal [t].  But as with [Sh] and [ch], shorter lines seem to follow the contrastive patterns less clearly than longer ones, at least if we separate the word-initial cases from the word-internal ones.
(09-12-2022, 06:27 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.The word-initial (left) and word-internal (middle) plots look strikingly complementary, which probably supports the idea that line-start words containing [Sh] and [ch] are somehow "equivalent" to mid-line words that start with [Sh] and [ch].  But it also looks as though the first and second words of lines tend to be yellower (indicating a higher [Sh] to [ch] ratio) for longer lines than for shorter ones.

Thanks for the new plots, Patrick! The complementary nature of some of these couples is extremely interesting, in particular because it is often opposite to what happens at word level.
The special preference of 'sh' for the first and second positions can also be seen (though less clearly) in the green plot I posted above. While the first position is notoriously special, it's fascinating that here also the second stands out so clearly.
I wonder if the different behaviours correlated with line-length might be due to Currier A vs B: A paragraphs lines are averagely about 2 words shorter than B lines (maybe because of all the image intrusions in the Herbal). Another possibility could be that they depend on the last line of each paragraph?
(10-12-2022, 12:20 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I wonder if the different behaviours correlated with line-length might be due to Currier A vs B: A paragraphs lines are averagely about 2 words shorter than B lines (maybe because of all the image intrusions in the Herbal). Another possibility could be that they depend on the last line of each paragraph?

Both of the factors you're suggesting seem very likely to have an impact.  I tried generating some plots limited to particular scribes/hands and ignoring all final lines of paragraphs.  The results do look a bit more consistent (or at least less skewed) across different line lengths that way, although the statistical noise is naturally worse due to the reduced data size.
(09-12-2022, 06:27 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.This new type of multicolor display seems really effective.

Like @pfeaster, I appreciate visualizations that summarize large amounts of data for consumption by the human visual channel (with its enormous bandwidth and pattern recognition ability).

This type of modified Palmer stack -- character density indexed by word position -- devotes the vertical axis to line length, instead of downwardness.  But is there anything happening in this dimension?  A survey of numerous glyphs and combinations finds that the density per line generally follows the global line-length distribution.  The stack for y (anywhere within a word) is typical:

[attachment=7075]

The brightest row occurs at the most-probable line length of ~10 words.  Density fades above (steeply) and below (gradually) in accord with the global distribution.  The single EVA  glyph that obviously deviates is T, with ~900 instances in the paragraph-text sample as cTh.  It prefers shorter lines (and greater rightwardness).  Among EVA bigrams, only ho has a clear bias for shorter lines.  Distributions strongly favoring longer-than-average lines are not evident.

@pfeaster has illustrated complementary pairs by multiplexing the patterns by color.  When we generate multiple stacks and look for complementarity, some dramatic contrasts can be found, even between T and t:

[attachment=7079]

The rightwardness alternation early in short lines is real.  The apparent transition as a function of line length, however, is driven by T alone.  No deep relationship between glyphs is asserted or implied here;  the idea is just to test-drive this new method for the visual display of line-position information.

Readers searching for spatial relations between gallows characters, or camouflage for a holiday party, may wish to contemplate the full glory of the gallows-gallows Palmer stack array:

[attachment=7064]

Entries on the diagonal necessarily show the densities of individual characters, as saturation of {1,1,0} yellow.  Entries reflected across the diagonal are redundant, with the hues simply reversed.  A naive summary of visible patterns would include:
  • The densities of several gallows character pairs delineate the boundary between the first and second word, as well as the penultimate and ultimate word, while k and f alone do so weakly at best.
  • Some pairs involving k and t further delineate the boundary between the second and third word.
  • Only T (and perhaps the infrequent P, as cPh) drives bias along the vertical line-length axis, visible here as vertically stratified hues.
  • ...?
Pages: 1 2 3 4 5 6 7 8 9