The Voynich Ninja

Full Version: Rightward and Downward in the Voynich Manuscript - Patrick Feaster
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
Three days ago, Patrick Feaster published You are not allowed to view links. Register or Login to view.. Like the previous one (discussed You are not allowed to view links. Register or Login to view.), it is full of observations, data and graphics (that's why reading it took me a few days). Patrick's work is innovative and thought-provoking: not only he computes interesting quantitative measures, but he presents them visually, making his results much more accessible.

For instance, these two graphs are about line-position for words starting with o- and qo-

[attachment=5764]

Words in each line are assigned to one of 10 positions (longer lines will have more than one word for some positions, while shorter lines will not contribute to some positions). In the first plot, the Y axis shows the % of words starting with o- and qo-.
The plot on the right has the same 10 positions on the X axis and shows the o/qo ratio on the Y. It makes visually clear that, in the first half of the line (with the exception of position 2), the two classes of words have similar frequencies; in the second half of the line, the frequency of o- rises, while that of qo- drops, so that the ratio increases steeply.

This research really deserves to be read carefully. But if you don't feel like reading the whole post, examining the graphs and reading the conclusions will give you an idea of Patrick's results.
Hi Marco, I've just managed to finish the article too. I agree it is hugely interesting and worth the time taken to read it. (I only wish Patrick would write for the average human and let us take his work in smaller chunks!)

The first point I would like to make is about the difference between absolute and continuous distribution of words and glyphs. Patrick makes this distinction very clear at the start, though it's slightly obscured by the time we get to the end. The absolute distribution is well-known, Currier knew it, I've discussed it and so have many others. The continuous distribution is what's new and really what we're seeking (at this moment) to prove or disprove. If proven, it's a significant new discovery.

Patrick covers most of the angles and there are certainly no obvious gaps or blindspots in his discussion. I think the one I would want to investigate most is the variability of glyphs and words by section. This is addressed, but maybe not as much as it could be. We know that different sections have different line lengths and paragraph lengths (as you know, paragraphs in Quire 20 are usually very short). Average word length between sections might also differ, such as with the greater use of [dy].

As [sh] is relatively more common in Quire 13 than other sections, [ch] more common in Hand 1 Herbal, [a] more common in Quire 2o, and [q] three times as common in Quire 13 than Hand 1 Herbal, I would really like to see those factored out. I think proof based on glyphs alone is probably best to ensure comparability and a robust sample size.

The other thing I would like to state is that even if continuous distribution is proven it doesn't subsume absolute distribution. Final [m] occurs at the end of a line due to an absolute effect, not a continuous one. Likewise for a number of patterns. They're not just near, or have a tendency toward, they are at an absolute position. Continuous distribution overlays that, adding to the complexity, but cannot replace it.
Hi Emma,
thank you for your comments! I agree that absolute effects clearly play a major role in line-structure. It would also be interesting to understand if such absolute effects are limited to the first and last position or if they involve other absolute positions (the second looks like a particularly good candidate).

In order to investigate different sections and scribes, I reproduced one of Patrick's experiments (the one discussed in the first post in this thread). The figures are about 'o' and 'qo' as word prefixes. My results (bottom) are well aligned with Patrick's (top).

You are not allowed to view links. Register or Login to view.

Then I used the same script to analyse the three larger subsections that were written by a single scribe and are uniform in apparent contents:
  • Scribe1 Herbal (aka Herbal A)
  • Scribe3 Q20 (almost all the starred-paragraphs section)
  • Scribe2 Q13 (Balneo / Bio)
These are the resulting plots:
You are not allowed to view links. Register or Login to view.

As one could expect from your observations, the behaviour pointed out by Patrick is the result (the average) of different behaviours. But my impression is that there is an overall coherence:
  • though there is much variability at line boundaries, the three sections show an increasing o/qo ratio in positions 4-9
  • changes in the distributions follow an A to B progression, with Scribe 3 (Q20) being intermediate between Scribe 1 (HerbalA) and Scribe 2 (Q13, extreme B).
(22-08-2021, 10:15 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.As [sh] is relatively more common in Quire 13 than other sections, [ch] more common in Hand 1 Herbal, [a] more common in Quire 2o, and [q] three times as common in Quire 13 than Hand 1 Herbal, I would really like to see those factored out. I think proof based on glyphs alone is probably best to ensure comparability and a robust sample size.
Emma -- thanks for your comments.  I know I have a bad habit of letting too much stuff accumulate before posting, so that when I do it tends to come out in chunks that are disagreeably large.  I'll try to work on that.  

In the past few days, I retooled one of my scripts to tackle the issue of "continuous" patterns more in earnest, and I think my latest results may be even more persuasive than the ones I posted before.  More soon, I hope.  But since you raise the issue of variation by section (and Nick Pelling raised this same issue in a comment on the blog post itself), here's another example in the meantime that might speak to it.

The experiment is to remove the first glyph and last two glyphs* from lines; to divide the remaining glyphs into five parts that are as equal as they can be, not counting spaces; and then to compare and contrast the occurrences of phenomena of interest in each of the five parts (such as glyph counts, spacing patterns, and bigrams).

If I do this for everything generally classified as Currier A, I get the following glyph counts in the five line sections:

[s]: 92, 134, 139, 185, 204
[ch]: 804, 692, 678, 673, 672
[Sh]: 312, 287, 229, 187, 149
[d]: 287, 379, 454, 536, 771

It's easy to see that the counts for [s] and [d] go continuously up, while the counts for [ch] and [Sh] go continuously down, and that the rates of change also differ from case to case.

If I limit the same study to just Quires 1-3, I instead get:

[s]: 29, 40, 34*, 64, 49*
[ch]: 355, 277, 294*, 276, 258
[Sh]: 109, 114*, 88, 70, 58
[d]: 113, 135, 164, 200, 300

This time there are four exceptions, which I've marked with asterisks, but the overall trend is otherwise the same for at least [ch] and [Sh], and there are no exceptions at all for [d].  Meanwhile, the low token counts for [s] look as though they could easily have fallen prey to random noise.

(If there's a better subset of Currier A to use for experiments like this than Quires 1-3, I hope someone will let me know.  I'd prefer to use a contiguous block of pages, but I'm not sure what the best Currier A equivalent is to Quires 13 and 20.)

I know this is just one example, but based on what I've seen so far, it seems to be typical of the relationship between findings for a whole "language" and findings for an associated section.  On the other hand, Currier A and Currier B seem to behave quite differently from one another in this respect as they do in so many others.  Of course, I have a constant dread that this will all trace back to some embarrassing mistake I made in my Python coding, but I seem to be getting consistent enough results by different methods that I'm feeling increasingly confident about them.

I have my own ideas about what these "continuous" patterns might mean, which I hinted at in my post, but I'm curious what others make of them and their potential significance.  Unless I'm mistaken, their existence would throw a bit of a monkey wrench into some theories about how Voynichese works.

* Why the first glyph and last two glyphs, you ask?  That choice actually had nothing to do with this particular experiment and came about only because I wrote the script mainly to analyze TRANSITIONAL probabilities (the likelihood of a given glyph type being followed by another given glyph type) and wanted to exclude the first and last transitions.  The fact that it counts the first glyphs of bigrams in each line segment is just a handy byproduct.
Hi Patrick, I don't think you're at all mistaken that the existence of these continuous changes in distribution could throw a monkey wrench into some theories! I'm very interested in them, and my comments (and any future comments) are based on a desire to get this right. I would like it to be explained in some other way, by an unaccounted factor, but that's not how research works. We have to observe and then account for those observations.

I'll see if I can write a little programme myself tonight (I doubt I'm as quick as you or Marco though) and bring some additional numbers. I have an idea of doing it a little differently so at least it will bring a slightly new angle.

A few things which interest me: does every possible glyph pair have a rightward difference? Are some pairs significantly more than different than others? And are there clear most/least rightward glyphs? I suppose my concern is that, if this is systemic, what does that mean? Does that make it more or less likely to be an artifact or a phenomenon?
(23-08-2021, 05:13 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.A few things which interest me: does every possible glyph pair have a rightward difference? Are some pairs significantly more than different than others? And are there clear most/least rightward glyphs? I suppose my concern is that, if this is systemic, what does that mean? Does that make it more or less likely to be an artifact or a phenomenon?
Those are all excellent questions, and I'd imagine the first few, at least, ought to be reasonably straightforward to figure out with further work.  My sense so far is that if we disregard [g] and [m], the glyph with the strongest tendency (leftward in this case) is [Sh].  But there also seems to be the "shape" of the distribution to be considered, along with overall average rightwardness.  For what it's worth, I created a couple graphs today showing the variation in token quantities in each fifth of the line for [a], [ch], [d], [l], [n], [q], [r], [Sh], and [y] in Currier A and Currier B.  As with the figures I gave before, these calculations leave out the first glyph and last two glyphs of each line, so any patterns should be basically line-internal.  I kept the color-coding the same for both "languages" in hopes of making comparisons easier to draw.[Image: glyph-rightwardness-currier-a-and-b.jpg]
Okay, I went a little AWOL, but I measured rightwardness of individual glyphs in Quire 13 lines according to line (glyph) length with a 10-point index:



Code:
a :  228  164  154  199  215  177  198  220  239  241

d :  167  270  252  262  313  298  250  280  305  258

e :  484  487  511  467  443  431  447  431  384  172

f :    3    5    1    3    1    5    8    5    3    1

i :  142  150  142  130  179  136  168  164  152  126

k :  215  174  253  221  239  202  235  204  176  91

l :  233  217  202  220  220  211  202  214  257  371

m :    0    1    0    0    0    0    0    2    0  60

n :  51  97  93  88  118  89  99  116  93  113

o :  469  387  389  430  399  378  393  399  384  280

p :    3    8  18  15  16  19  10    8  10    8

q :  29  230  201  215  155  167  169  162  120  30

r :  87  76  74  73  75  64  82  105  117  168

s :  17  28  18  32  33  27  23  27  34  37

t :  57  61  68  82  98  72  93  113  104  44

y :  140  421  366  397  388  382  360  382  390  531

C :  177  153  150  132  159  160  143  159  150  80

S :  187  154  115  117  115  98  100  94  73  17

K :  11  28  23  14  22  24  24  27  23    4

T :    3  11  14    8  13  12  16  13  19    7

F :    0    2    0    0    0    0    0    1    1    0

P :    1    2    3    1    2    2    1    2    5    0

This isn't quite what we were discussing, I know. I'll try to reproduce your results properly tomorrow.
Hi Patrick, I've been thinking more about your results and just wanted some clarification on how you calculate the rightwardness scores. You say in your post:

Quote:My usual method for calculating rightwardness tendencies for words has been to number the words in each line, starting at zero; then to divide these numbers by the quantity of words in the line (minus one), so that each word ends up assigned a value between 0 (first word in line) and 1 (last word in line); and finally to take the mean average of the values for all tokens of a particular word, or of some group of words sharing a common characteristic, so that higher values will correspond to greater overall rightwardness much as higher numerical temperatures correspond to greater heat.

So, for example:
  1. If a word appears 10 times in position 5 of a 9 word sentence, it will have a score of .500.
  2. If the same word appears one of those ten times in position 4 (rather than 5) in a 9 word sentence, it will have a score of .488.
  3. If the same word appears one of those ten times in position 5 of a 10 word sentence (rather than a 9 word sentence), it will have a score of .494
In cases 2 and 3 the difference from case 1 will be .012 and .006 respectively. Is that correct?

Also, when you divided the line into ten parts, you give the word counts for each:

Quote:One option is to take our fractional measures of line position in a range from one to zero, multiply them by some factor, and then round each of them to the nearest integer.  The resulting groups will vary in size, but we can normalize for that in our subsequent calculations.  If we divide the line into ten groups as I’ve described, we find that they contain 4134, 2489, 3294, 3152, 2223, 4244, 3152, 3294, 2489, and 4107 words respectively.

There is quite a bit of difference between the number of words in some of these groups. I'm particularly interested in groups five and six, which have word counts of 2,223 and 4,244, respectively. That feels really "lumpy". Is there some reason why the number of words are distributed in this way? Is it just that the most common line lengths are such that some position deciles get more words than others?
"In cases 2 and 3 the difference from case 1 will be .012 and .006 respectively. Is that correct?"

Yes, your example conforms to what I did.

"Is there some reason why the number of words are distributed in this way? Is it just that the most common line lengths are such that some position deciles get more words than others?"

I've been assuming the explanation you give is the right one.  Most line lengths don't divide evenly into tenths, so that when rounding assigns each word to whichever tenth is nearest, words from lines of certain lengths get excluded from certain groupings.
I've also noticed that if we exclude the first two and the last line positions from the rightward scores the average possible score (assuming perfect distribution) is higher for shorter lines. If the distribution is uneven and the words tend earlier or later, then the differences are even greater.

I suppose what I'm saying is that a small difference in average line length could have a significant impact on the rightwardness score.

The average line length (in words) where the word occurs in Quire 13 (not controlling for line ends):

ar 10.000
or 9.736
al 10.500
ol 9.414

chey 10.011
shey 9.871
chedy 9.635
shedy 9.506

qokeedy 8.791
okeedy 8.610
qokedy 9.121
okedy 9.773

Interestingly, these aren't consistent with your results in every case, but are the differences enough to have an impact? Could something like this be the cause?
Pages: 1 2 3 4 5 6 7 8 9