The Voynich Ninja - Rightward and Downward in the Voynich Manuscript

Pages: 1 2 3 4 5 6 7 8 9

(03-12-2022, 04:46 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.So I think we might have been able to predict that [n.q] would be underrepresented, and that [y.q] would be overrepresented, based just on the degree of overlap in distribution of words ending [n] and [y] and words beginning [q].

I've investigated this a little further by calculating actual and predicted word-break combination frequencies for [y.q], [n.q], [n.ch], and [y.ch] within specific parts of paragraphs and lines. I divided paragraphs into four bins (first line, last line, first half of what's left, second half of what's left) and lines into six bins (first break, second break, last break, and first, second, and third thirds of what's left in between) -- 24 bins total. This was limited to paragraphic text, using the ZL transcription, ignoring comma breaks.

Factoring in positionality in this way pushes the predictions slightly closer to the actual counts, but not by nearly enough to close the gap: most of the discrepancy remains.

[y.q] 3291 actual, 1940 predicted without factoring in position, 1976 predicted factoring in position
[n.q] 370 actual, 797 predicted without factoring in position, 786 predicted factoring in position
[n.ch] 1210 actual, 880 predicted without factoring in position, 903 predicted factoring in position
[y.ch] 1505 actual, 2141 predicted without factoring in position, 2129 predicted factoring in position

At the same time, if we compare predictions against actual counts within each of the 24 bins, we find that both of these quantities rise and fall together in sync, more or less. That is, the word-break combinations tend to be more or less common in the specific parts of paragraphs and lines where we'd predict they'd be more or less common based on the intersection in the distributions of their component parts. (We've already seen evidence of this in some of the earlier charts, but this approach provides some quantitative confirmation.) The y axis below represents the percentage of total word breaks within each bin, and red lines mark the boundaries of the four paragraph-level divisions.

[attachment=7104][attachment=7103][attachment=7105]

The actual values seem to be deflected upward or downward from the predicted values by a fairly consistent amount. That is, the distributions of these word-break combinations are positionally uneven almost exactly as we'd predict them to be, but some other factor seems to be making each word-break combination consistently more or less prevalent on top of those patterns.

I'm still trying to interpret all this but thought I'd share it in case anyone else has any ideas.

Pages: 1 2 3 4 5 6 7 8 9