The Voynich Ninja - Scribes and authorship of the text

Pages: 1 2 3

(24-08-2025, 06:57 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.The various parts of the VMs may have been done by different people and potentially at different times over a couple of decades.

You might be right. The manuscript might not have been written in one go. Each section could have been a separate piece of work with the sections later being bound into one volume. Bound together for the convenience of having all works in the unknown writing in one manuscript.

Writing the sections at different times might also explain some of the differences in the language. In the gaps of time between the sections the authors could have lost some fluency in the use of the 'method'. In particular the language in the Bio B2 pages ( most of quire 13 ) suggests they could have been written at a different time. The language there seems to show less variability than in the rest of the manuscript. There is a high frequency of just a few words. 8 words make up 20% of the total. With ol coming top and daiin aiin not in this list. But in the Herbal B2 pages 20% of the total is made up of 12 words, with ol now not in this list and daiin aiin now being top. Yet this is odd since both sets of pages are in hand 2, supposedly written by the same person, and marked as being language B. It suggests the two sets of pages could have been written at different times, with some loss of 'method' in between.

(25-08-2025, 08:12 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.a high frequency of just a few words.

For your information and amusement here are the lists

You are not allowed to view links. Register or Login to view.

(25-08-2025, 08:12 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(24-08-2025, 06:57 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.The various parts of the VMs may have been done by different people and potentially at different times over a couple of decades.

You might be right. The manuscript might not have been written in one go. Each section could have been a separate piece of work with the sections later being bound into one volume. Bound together for the convenience of having all works in the unknown writing in one manuscript.

Writing the sections at different times might also explain some of the differences in the language. In the gaps of time between the sections the authors could have lost some fluency in the use of the 'method'. In particular the language in the Bio B2 pages ( most of quire 13 ) suggests they could have been written at a different time. The language there seems to show less variability than in the rest of the manuscript. There is a high frequency of just a few words. 8 words make up 20% of the total. With ol coming top and daiin aiin not in this list. But in the Herbal B2 pages 20% of the total is made up of 12 words, with ol now not in this list and daiin aiin now being top. Yet this is odd since both sets of pages are in hand 2, supposedly written by the same person, and marked as being language B. It suggests the two sets of pages could have been written at different times, with some loss of 'method' in between.

You raise a very interesting point.

In fact, if we look at the results of my topic–hand and topic–language analyses (You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.), we ""could"" (with many quotation marks) even reach the conclusion that the scribes themselves were divided by topic. In other words, each scribe might have been more of a specialist in certain “topics” of the manuscript.

This would fit the idea that the manuscript was not produced in one go but rather compiled from different pieces of work, possibly created at different times. If so, it could explain why topic distributions correlate more strongly with scribal hands than with Currier languages in my results.

Of course, this doesn’t prove the scribes consciously worked by thematic specialization, but the statistical correlation does suggest that some structural separation exists beyond pure chance. Whether that reflects real “expertise” in topics, different time periods of writing, or even different stages of mastering the underlying method, remains open to interpretation.

(25-08-2025, 08:40 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.it could explain why topic distributions correlate more strongly with scribal hands

Perhaps it would help to count the number of words ( paragraph words only, no labels, no radial or circular text ) by topic, hand and language.

You are not allowed to view links. Register or Login to view.

In particular all biological page text is in hand 2, language B. 90% of the stars page text is in hand 3, language B. These two make up 50% of the whole.

I have always found it useful to restrict my analysis to just the following works:

Bio_B2 6549
Herbal_A1 8086
Herbal_B2 2369
Pharma_A1 2490
Stars_A3 781
Stars_B3 11319
Text_B2 1830

With relatively few words outside of these works it might be difficult to get proper correlations between topic and hand and language.

Inspired by Claire Bowern's recent work on scribal spacing habits, I have some new quantitative results at least partially connected to the scribes question.

Several months ago, Bowern You are not allowed to view links. Register or Login to view. an extremely clever method for studying how spacing width varies across a given line of VMS text: You are not allowed to view links. Register or Login to view., which describes spatial bounding boxes around the words. By using these bounding boxes to calculate the gaps in between adjacent VMS tokens, Bowern found that spaces tend to get smaller toward the right margin of the page, consistent with a scribe seeking to cram more text into a given line.

There's some pretty cool things you can do with this dataset. For one, I have independently replicated Bowern's spacing findings, here presented just for f103r-f116r:

[Image: NID750C.png]

You can also quantify the average pixel area and aspect ratio (width/height) for a given word type across the VMS, based on the Voynichese.com transcription, as well as how much a given instance of that word type varies from the VMS-wide average. I hypothesized that when controlling for word type, there would be consistent scribal differences in typical handwriting size and proportions (ie., big vs. small, shorter and squatter glyphs vs. taller and skinnier ones).

First, I delimited to only the word types that appear a minimum of 10 times across the VMS (within this particular transcription). For each bifolio represented in the Voynichese.com transcription, I then calculated the average area and average aspect ratio of each unique word type within the bifolio. Next, I Z-scored each word type's average bifolio-level area and aspect ratio against the VMS-wide equivalent for each word. (That is, if the average Z-scored area of <daiin> within a given bifolio is reported as -1.0, that means it's one standard deviation below the average area of all <daiin> tokens within the VMS.) To obtain a rough bifolio-level summary statistic, I then averaged all the Z-scored word averages within a given bifolio.

This is a deliberately high-level analysis; each bifolio is reduced down to a 2D coordinate based on two average-of-averages. But if you then perform a k-means clustering analysis on this collection of points, you get meaningful clustering—and that clustering correlates pretty well with previously proposed groupings.

I find that k-means analysis recovers three clusters of bifolia:

[Image: DAAFjcy.png]

As it turns out, these clusters are generally consistent with the long-observed divide between Voynich A and Voynich B, where the handwriting of Voynich A tends to be bigger on average:

[Image: 8ULGngJ.png]

And here's how Davis's five scribes map onto these points:

[Image: INNC5jP.png]

We consistently see a grouping of large-handwriting bifolia that's similar to Voynich A and Davis's Scribe 1. This is not surprising; for one, Davis herself has noted that her Scribe 1 has bigger, clearer, and more spacious writing than other scribes, whereas her Scribe 2 is more "cramped."

The results across Voynich B are more mixed. Davis's Scribe 4 cleanly separates out based on the aspect ratio—but because those pages are folios 67-73, a nonzero number of words are written vertically or at non-horizontal angles, which would therefore lower these sections' average aspect ratio relative to other portions of the VMS. There's substantial overlap between Davis's Scribes 2 and 3, and I don't recover a clean, consistent grouping of the two bifolia associated with Davis's Scribe 5.

I interpret these results as evidence that there are nonzero handwriting differences between Voynich A and Voynich B, consistent with both Currier and Davis's previous findings. This analysis does not recover clean distinctions among Davis's Scribes 2, 3, and 5—but that doesn't mean that inter-scribal differences don't exist. It just means that in the average sizing of their Voynichese glyphs and the average proportions of their lettering, these putative scribes don't cleanly separate out from one another. In addition, the distinctiveness of Scribe 4 within this analysis can be ascribed mainly to the presence of non-horizontal words within folios 67-73, so that would require follow-up.

Anyway, I see a lot of potential utility in this kind of approach.

(15-10-2025, 08:11 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.Bowern found that spaces tend to get smaller toward the right margin of the page, consistent with a scribe seeking to cram more text into a given line.

Great analysis!

The puzzling bit is why there is a slight increase from the first graph point to the second, while the following ones decrease just as expected.

It could be an artifact of how word spaces are identified by transcribers, and how the dubious space (",") are treated in the analysis. When the first glyph of the line is y or o, it is often followed by a slightly wider space that some transcribers mark with period, comma, or nothing, often based on subjective criteria that seem to be different from those used when a word-initial o or y appears in the middle of a line. Could this inconsistency be the cause of that anomaly near the left edge?

All the best, --jorge

(15-10-2025, 08:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.The puzzling bit is why there is a slight increase from the first graph point to the second, while the following ones decrease just as expected.

It could be an artifact of how word spaces are identified by transcribers, and how the dubious space (",") are treated in the analysis. When the first glyph of the line is y or o, it is often followed by a slightly wider space that some transcribers mark with period, comma, or nothing, often based on subjective criteria that seem to be different from those used when a word-initial o or y appears in the middle of a line. Could this inconsistency be the cause of that anomaly near the left edge?

I haven't looked at this in detail, so I'm not sure. I also wonder whether there is a kind of "relaxation" in spacing from the end of one line to the start of the next. In principle, a scribe wrote the beginning of one line very shortly after finishing the previous line, so over the first couple of words in the new line, the scribe is adjusting to having much more room to write than they just did. Then the scribe settles into a spacing groove, only to start to narrow their spacing—and increase their cramming—once they're about halfway done with a line.

Very interesting!
How did you handle scribe 4, whose output is mostly along circles? The word boxes in voynichese.com are not along the direction of writing, but along the page orientation (horizontal, vertical). Did you skip these?

I have a first set of boxes along the direction of writing, but the word boundaries are not yet very accurate. Would you be interested in using these?

(15-10-2025, 11:53 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Very interesting!
How did you handle scribe 4, whose output is mostly along circles? The word boxes in voynichese.com are not along the direction of writing, but along the page orientation (horizontal, vertical). Did you skip these?

I have a first set of boxes along the direction of writing, but the word boundaries are not yet very accurate. Would you be interested in using these?

I used the voynichese.com bounding boxes without any corrections, so Scribe 4's results are almost certainly affected by the fact that many of those words are not horizontal. By far the biggest expected issue would be a lower-than-expected average aspect ratio. The voynichese.com bounding box orientation lists some of Scribe 4's words as being taller than they are wide because those words are written vertically relative to the page orientation. I'm fully aware of this as an issue. Milder versions of this problem almost certainly affect other pages. If the text on a given line is written at a slight angle relative to the page's true horizontal, the resulting bounding boxes for the words will on average be slightly bigger than their writing-direction bounding boxes were.

I have some ideas for how to correct for this. Assume that a word's "true" bounding box oriented along the direction of writing has width w and height h. Also assume that the direction of writing is at some angle q relative to the page's horizontal orientation. The ratio of the area of the "horizontal" bounding box, analogous to what Voynichese.com provides, to the area of the "true" bounding box is given as:

R = ((w*sin(q)+h*cos(q))*((w*cos(q)+h*sin(q))/(w*h)

At small q, it's not so bad, but it's a non-negligible problem.

As a next step, I want to try and automatically provide an initial correction at the line level by calculating the average slope of a given line of text, converting that to an angle, and then applying a line-average correction using the formula above treating the slope angle as q (or at least as an input parameter for how to calculate q). That doesn't correct for the Scribe 4 circles, but it could tighten everything else up.

(16-10-2025, 01:04 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I have some ideas for how to correct for this. Assume that a word's "true" bounding box oriented along the direction of writing has width w and height h. Also assume that the direction of writing is at some angle q relative to the page's horizontal orientation. The ratio of the area of the "horizontal" bounding box, analogous to what Voynichese.com provides, to the area of the "true" bounding box is given as:

R = ((w*sin(q)+h*cos(q))*((w*cos(q)+h*sin(q))/(w*h)

At small q, it's not so bad, but it's a non-negligible problem.

As a next step, I want to try and automatically provide an initial correction at the line level by calculating the average slope of a given line of text, converting that to an angle, and then applying a line-average correction using the formula above treating the slope angle as q (or at least as an input parameter for how to calculate q). That doesn't correct for the Scribe 4 circles, but it could tighten everything else up.

I did these things a couple of years ago. Converting the horizontal boxes to slanted boxes using a formula like this cannot work without some assumptions, and when the angle is 45 degrees (mod 90) the problem becomes singular.
It is very ill-determined for angles around these values.
I have average slopes for all non-curved loci.
If you're interested in following this up, best send me a short E-mail.

Pages: 1 2 3