The Voynich Ninja

Full Version: Rightward and Downward in the Voynich Manuscript - Patrick Feaster
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
From a forensic perspective.
I would be interested in the following in the investigation.
There are positions where "daiin" occurs three times in succession.
In this position I can exclude several persons, also a difference in the physical condition falls away.
The ink has the same consistency, (same base, freshly stirred and same age).

Here a conclusion can be drawn from the result.
The same image should be created. Otherwise, I think it is the amount of ink on the nib. It decreases during writing.

Translated with You are not allowed to view links. Register or Login to view. (free version)
(01-12-2021, 12:50 AM)lurker Wrote: You are not allowed to view links. Register or Login to view.Each pair of recto and verso pages belong to the same folio. Therefore it would be a surprise if recto and verso groups groups would behave differently.

Not necessarily, the experiment he performed confirms that the different patterns are not just a result of random variation.
(01-12-2021, 12:50 AM)lurker Wrote: You are not allowed to view links. Register or Login to view.Each pair of recto and verso pages belong to the same folio. Therefore it would be a surprise if recto and verso groups groups would behave differently. 

Instead I suggest to divide for scribe 1 into Botanical/Pharmaceutical folios and for scribe 2 into Botanical/Biological folios.

That would be a worthwhile experiment too, but I think it would involve a subtly different question.  The recto/verso division only tests whether the patterns that seem to exist in the results for all pages attributed to a particular scribe still appear when we use only one or another half of the dataset.  The apparent patterns could be illusory, after all, and just artifacts of statistical noise.  But if the two halves of the dataset yield similar results, that suggests the patterns are real and meaningful.

The division you suggest would test the coherence of the scribal groupings themselves -- i.e., whether individual sections attributed to the same scribe are more like each other (in this way) than they are like sections attributed to other scribes.  I wasn't being quite that ambitious yet, but I agree it would be a good follow-up step, depending on the results of the first experiment.
(01-12-2021, 12:08 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Cool! Thanks for checking this so quickly. 3, 2, 3, 2? (if this is incorrect it is because I am reading incorrectly, not because the data is inconclusive). 

Koen's answer is correct.  I wanted to wait a few hours before confirming in case anyone else wanted to participate.  Specifically, the order from left to right is:

a) Scribe 3 verso
b) Scribe 2 verso
c) Scribe 3 recto
d) Scribe 2 recto

It's possible that differences in average line and paragraph length will have impacted the appearance of the images in addition to actual differences in distribution.  A larger proportion of short paragraphs could produce more vertical banding, for example.

But some differences still seem fairly decisive.  The sharp contrast in [daiin] between first and second line positions, with a peak in first position, appears for Scribe 3 but not for Scribe 2.  With Scribe 3, the first line position favors vords beginning [o]+gallows; with Scribe 2, it favors vords beginning [qo]+gallows.  The near-end-of-line peak for vords beginning [o]+gallows is stronger and more consistent throughout the paragraph for Scribe 3 than for Scribe 2 (though paragraph lengths might be a factor there).  And so on.

The differences between results for recto and verso page groups attributed to the same scribe seem informative too.  

I went back and repeated the experiment using two groups that alternate between recto and verso pages in order to check whether the recto/verso distinction itself has any impact.  Surprisingly enough, it looks as though it does.  For Scribe 3, [daiin] shows a much stronger preference for first line position on verso pages than on recto pages, and this difference vanishes with scrambled recto/verso groupings.  On the recto pages, [daiin] seems to be about as common in third line position as in first line position; on the verso pages, it seems to be virtually absent there.  If this can be confirmed, I think it would present a challenging hurdle for hypotheses about what positional patterns "mean."

Otherwise, the exact positions of mid-line banding phenomena (as well as the features of the second line position) seem to vary quite a lot both with the recto/verso groups and with the mixed groups.  Maybe that's to be expected with different combinations of line lengths.  So, for example, the near-end-of-line peak of [o]+gallows vords for Scribe 3 occurs a little earlier on average on verso pages than on recto pages, but even when I mix recto/verso pages, its position and vertical consistency still varies noticeably.  I guess the finer structure of banding is likely to be due to random variation and probably isn't trustworthy as a source of insight.

Here are some modest hypotheses I'd like to put forward for critique, based on the foregoing:

1. If the opposite of "flat" is "bumpy," then the text of the VM is fundamentally bumpy, not just superficially bumpy.
2. Its bumpiness isn't limited to beginnings of paragraphs or to beginnings or ends of lines (although it especially affects those places) but extends throughout the whole body of the text.
3. Its bumpiness isn't limited to nonce similarities among nearby vords (basic self-citation) but constrains the text by position according to consistent patterns.
4. Its bumpiness affects the text structure thoroughly enough to preclude any method of text generation (whether natural-language plaintext, cipher, invented language, stochastic hoax, or "other") that is fundamentally flat with only minor tweaks made to the result.

And some questions:

1. The above observations are limited to paragraphic text.  Do labels, circles, and radii show any similar patterning by position (based e.g. on outwardness or angle)?
2. Lines and paragraphs seem to be positionally patterned units, but do larger units show overarching positional patterns too (e.g., multi-paragraph pages, whole bifolios)?
3. Which patterns are tied to absolute positions (e.g., the first vord of a line) and which are tied to relative positions (e.g., the middle third of a line) -- and given chronic uncertainty about vord divisions, how can we reliably tell?
4. Do line length and paragraph length affect positional patterns and, if so, how?
Thanks again to Patrick and all other participants for this great discussion!

I was wondering if single-scribe corpora are large enough for detailed word-oriented analysis. For instance (unless I am miscounting), each of Scribe2 and Scribe3 wrote about 150 occurrences of daiin. This means that, when considering recto and verso separately, each cell in a 10x10 grid averagely represents less than one occurrence on average. I am sure that Patrick can find a way to assess the meaningfulness of these results: possibly, comparing with random sets of pages as he did might be enough. I am definitely too ignorant about statistics to have any idea about how to tackle this.

Another side of the subject is the possible impact of page layout. In particular, You are not allowed to view links. Register or Login to view. I noticed that image-breaks (frequent in Herbal pages) appear to have an effect on nearby words. The extra-long lines in the Pharma section might also be of interest.

I am looking forward to read more!
(01-12-2021, 04:11 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I was wondering if single-scribe corpora are large enough for detailed word-oriented analysis. For instance (unless I am miscounting), each of Scribe2 and Scribe3 wrote about 150 occurrences of daiin. This means that, when considering recto and verso separately, each cell in a 10x10 grid averagely represents less than one occurrence on average.

You make a good point, and while I'm not sure what tests of statistical significance would be most useful, I can supply one more piece of data.  In these displays, the numerical value of pixel brightness is proportional in a linear way to the number of tokens at each point.  Tokens in shorter lines and shorter paragraphs have a bigger footprint, but when I've tried to adjust for that by scaling brightness in inverse proportion to footprint size, it hasn't seemed to make much of a difference, so I stopped doing it.  So the brightest areas are now simply those where the greatest number of token footprints overlap.  The numerical value of each point before I boost contrast should approximately equal the number of tokens contributing to it, although because of the spline interpolation, a "direct hit" on the center of a token's position receives a higher score than an oblique one, so this won't be exact.

For the four charts of [daiin], the maximum values were originally:

Scribe 2 recto: 5.3
Scribe 2 verso: 3.1
Scribe 3 recto: 8.1
Scribe 3 verso: 16.0

So in the Scribe 3 verso image, the brightness scale runs from 16 tokens at brightest to 0 tokens at darkest, while in the Scribe 2 verso image, it runs from only around 3 tokens at brightest to 0 tokens at darkest (which I'll admit isn't much of a range).

For the qo+gallows / o+gallows charts, the maximum values were:

Scribe 2 recto: 14.2
Scribe 2 verso: 15.0
Scribe 3 recto: 34.0
Scribe 3 verso: 32.0

I'm also subtracting the minimum value before rescaling, but I'm pretty sure that in each of these cases there's at least one point with no tokens, so that the "real" minimum is indeed zero.
(01-12-2021, 03:04 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Here are some modest hypotheses I'd like to put forward for critique, based on the foregoing:

1. If the opposite of "flat" is "bumpy," then the text of the VM is fundamentally bumpy, not just superficially bumpy.
2. Its bumpiness isn't limited to beginnings of paragraphs or to beginnings or ends of lines (although it especially affects those places) but extends throughout the whole body of the text.
3. Its bumpiness isn't limited to nonce similarities among nearby vords (basic self-citation) but constrains the text by position according to consistent patterns.
4. Its bumpiness affects the text structure thoroughly enough to preclude any method of text generation (whether natural-language plaintext, cipher, invented language, stochastic hoax, or "other") that is fundamentally flat with only minor tweaks made to the result.

First off, thank you for another exciting spelunk into the VMs text, Patrick. This is clearly a much deeper and more complex network of tunnels than most of us could have imagined. What impresses me the most about your work is your willingness and ability to think outside the boxes of "the arrangement of glyphs in vords" and "the arrangement of vords in lines", when looking for statistical patterns in the text. It's humbling, though very much in a good way, to have the unconscious blinders on one's thinking pointed out.

I feel like a Shang Dynasty oracle bone reader when I look at the "shapes" — the lines and blobs and holes — produced by your and Obelus' two-tone frequency maps. I can almost make out a massive capital letter A in one of them. I feel like these shapes might offer meaningful clues, but I'm not at all sure why, or in what way, and I'm wary of falling prey to pareidolia or mistaking maps for territories. These "shapes" are in all likelihood mere epiphenomena of your data collection and presentation methods. But they got me thinking seriously about something which has been kind of a dirty word in Voynich studies since William Romaine Newbold's fall from grace: steganography. I don't mean this specifically the way Newbold did. That is, I don't think the Voynichese glyphs encode information in their tiniest details. I mean steganography in a broader sense. Think ASCII art and its predecessor the calligram. Or connect-the-dots or paint-by-numbers activities. All three of these involve complex and deliberate spatial arrangements of written glyphs. These glyph arrangements are not linguistic, but they are most definitely meaningful. The meaning is directions for how to connect them together visually to form images, including large scale glyphs or symbols. Could Voynichese ngrams be encoded directions for how to place lines or colors or some other sort of information array on a page?
(28-11-2021, 10:38 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.
(28-11-2021, 08:13 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.Very interesting images, but what does the RGB / brightness exactly show in these images, could your please explain.

If,  for example Daiin, the brightness shows the left position, the image can not be correct because daiin does not have preferred position at the beginning of lines.

Brightness corresponds to greater prevalence in a given position.  
...<snip>
Here are new grayscale images for [daiin] in Currier A and B, side by side.

...<snip>


It's the image for Currier B that shows a bright peak at the left of lines and the vertical center of paragraphs.  If I'm reading my spreadsheets correctly, in Currier B, about 30% of tokens of [daiin] are the first vord in a line; about 8.5% are the last vord in a line; and about 61.5% fall somewhere in the middle of the line.  In all of Currier B, by contrast, about 11.5% of vords are first or last in their lines, while about 77% fall somewhere in the middle.  So [daiin] does seem significantly overrepresented in the first line position in Currier B.  The image suggests that this overrepresentation is also limited to the vertical center of the paragraph.  

I haven't double-checked that, but if it's true, then the anomaly would be even stronger than the line-position statistics would suggest by themselves.

...<snip>


Heart could you find a way to merge these with the following:   horizontal line size in amount of words and/or vertical line size in amount of words ?
(02-12-2021, 04:06 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.could you find a way to merge these with the following:   horizontal line size in amount of words and/or vertical line size in amount of words ?

I built an option into my script that allows for generating separate images for different line lengths -- say, all seven-vord lines, then all eight-vord lines, then all nine-vord lines, etc.  For each pass, any line containing a different quantity of vords is ignored and also factored out of the total possible value at each point (which I use to help scale brightness at the end, so that, for example, lines that have fewer than four vords or paragraphs that have fewer than three lines don't "count against" missing positions).

I haven't tried this approach yet with paragraph length, partly because I'm less sure what different sizes or groups to use.  But it should be equally possible in principle.

The main problem, I think, is that limiting results by line or paragraph length exacerbates the problem of dataset size which Marco brought up.  If the quantity of tokens of [daiin] starts to get uncomfortably small when we limit our scope to scribe 2 verso pages, it's easy to imagine a similar outcome if we were instead to limit our scope to something like lines containing seven vords or paragraphs containing five lines -- or both.

And then on top of that, there's the detail that a single uncertain vord space means the difference between (say) a seven-vord line and an eight-vord line.

One alternative I've tried has been to analyze lines as continuous glyph strings, either ignoring vord divisions or treating [.] as its own glyph.  Of course, that still means making decisions about what counts as one glyph.  In my case, I've been arbitrarily treating [ch], [Sh], benched gallows, and any quantity of [e] or [i] as single glyphs.  The idea is then to go through each line comparing a target string such as [r] or [ot] or [edy] against every successive string of that size with a moving window -- so, for example, matching [edy] against [chedychey] would yield 0 [ched] 1 [edy] 0 [dych] 0 [yche] 0 [chey] = 01000.  Then the result for each line can be stretched to 500 pixels, as before.

But if the results of vord-by-vord analysis quickly go "out of focus" in mid-line when we mix different line lengths, the same is true of glyph-by-glyph analysis, or even more so, since there are -- of course -- many more line lengths in glyphs than there are line lengths in vords.

Since I think we care less about *exactly* where a glyph falls in mid-line than we do about *approximately* where it falls, I've tried blurring the result by compressing it by a particular factor before expanding it to 500 pixels.  That factor should be our best guess about average cycle length -- that is, the number of "steps" that would typically separate one [ot] from the nearest other [ot].  Fortunately, this probably doesn't need to be exact.  I've generally been going with a factor of four or five.  What we're then measuring is the count of occurrences of the target string within four or five positions of wherever we're at in a line.

The results tend not to differ radically from those that come from vord-by-vord analysis, but they should be free of any distortions due to uncertain vord breaks -- albeit maybe at the expense of something else.

One advantage of this other approach is that it lets us examine positions of phenomena that cross vord boundaries, such as Smith-Ponzi word-break combinations (which could also be tracked more directly, but this way seems to work).

Here's a pair of charts for Davis Scribe 1 / Currier A plotting [nd] in blue, [no] in green, and [nch] in red.  They actually show all occurrences of those glyph strings with or without spacing, but all these strings contain a space in an overwhelming majority of cases, as [n.d], [n.o], [n.ch].  The chart on the right separates out first and last lines of paragraphs.

[attachment=6088]

The brightness scale runs from roughly 11.5 tokens maximum to 1.5 token minimum, which seems like enough of a range to be statistically meaningful, though I'd welcome a more expert opinion on that point.  The blurring factor is four positions.

It looks like a vord ending [n] is most likely to be followed by a vord beginning [o] in the first line of a paragraph; by a vord beginning [ch] in the first third of a line or the last line of a paragraph; and by a vord beginning [d] in the final third of a line.

But is that distinctive?  For comparison, here's a chart of [.d] in blue, [.o] in green, and [.ch] in red to show the overall distribution of these glyphs in vord-initial position (but not line-initial position) regardless of preceding glyph.  This time the brightness scale runs from around 40 tokens maximum to around 1.5 tokens minimum; same blurring factor as before.

[attachment=6089]

I'm not sure whether the pattern after [n.] is significantly different from the general pattern or not -- the contrasts look stronger, particularly in the red [n.ch] region, but that may just be due to the smaller amount of data.

Still, adapting the ingenious Smith-Ponzi method of analyzing word-break combinations, I suppose we could in principle take the chart for [n.] in vord-final but not line-final position --

[attachment=6090]

-- and multiply it by the chart for [.d], [.o], and [.ch] to generate a chart representing the "expected" distribution of [n.d], [n.o], and [n.ch], and then contrast the actual and expected distributions -- and we could try this with other combinations as well.  It would be interesting to see if the intriguing disparities Emma and Marco wrote about have a positional dimension to them.
(02-12-2021, 04:06 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.could you find a way to merge these with the following:   horizontal line size in amount of words and/or vertical line size in amount of words ?

I've done some experimenting with a different kind of display that assigns the vertical axis to line length (in vords) rather than downwardness in paragraphs.  The blue and green color channels represent the target phenomenon (again by vord), while the red color channel shows the total quantity of lines of each length for reference.  I've also varied the height of each row by line length to try to make the area taken up by each vord position roughly equal.

Here's a display made that way of the distribution of [daiin] on Scribe 3 pages.

[attachment=6095]

I've tried the same approach to show distribution by glyph string, ignoring spaces, although without varying the height of rows.  Here's an example of that, again for Scribe 3 pages, with [ok] and [ot] alternating in an animated GIF.

[attachment=6096]

So far I don't notice any obvious statistically significant differences by line length, although I wouldn't want to draw any serious conclusions from just these few examples.

I'm less sure how to handle variations in paragraph height.
Pages: 1 2 3 4 5 6 7 8 9