Inspired by Claire Bowern's recent work on scribal spacing habits, I have some new quantitative results at least partially connected to the scribes question.
Several months ago, Bowern You are not allowed to view links.
Register or
Login to view. an extremely clever method for studying how spacing width varies across a given line of VMS text: You are not allowed to view links.
Register or
Login to view., which describes spatial bounding boxes around the words. By using these bounding boxes to calculate the gaps in between adjacent VMS tokens, Bowern found that spaces tend to get smaller toward the right margin of the page, consistent with a scribe seeking to cram more text into a given line.
There's some pretty cool things you can do with this dataset. For one, I have independently replicated Bowern's spacing findings, here presented just for f103r-f116r:
You can also quantify the average pixel area and aspect ratio (width/height) for a given word type across the VMS, based on the Voynichese.com transcription, as well as how much a given instance of that word type varies from the VMS-wide average. I hypothesized that when controlling for word type, there would be consistent scribal differences in typical handwriting size and proportions (ie., big vs. small, shorter and squatter glyphs vs. taller and skinnier ones).
First, I delimited to only the word types that appear a minimum of 10 times across the VMS (within this particular transcription). For each bifolio represented in the Voynichese.com transcription, I then calculated the average area and average aspect ratio of each unique word type within the bifolio. Next, I Z-scored each word type's average bifolio-level area and aspect ratio against the VMS-wide equivalent for each word. (That is, if the average Z-scored area of <daiin> within a given bifolio is reported as -1.0, that means it's one standard deviation below the average area of all <daiin> tokens within the VMS.) To obtain a rough bifolio-level summary statistic, I then averaged all the Z-scored word averages within a given bifolio.
This is a deliberately high-level analysis; each bifolio is reduced down to a 2D coordinate based on two average-of-averages. But if you then perform a
k-means clustering analysis on this collection of points, you get meaningful clustering—and that clustering correlates pretty well with previously proposed groupings.
I find that
k-means analysis recovers three clusters of bifolia:
As it turns out, these clusters are generally consistent with the long-observed divide between Voynich A and Voynich B, where the handwriting of Voynich A tends to be bigger on average:
And here's how Davis's five scribes map onto these points:
We consistently see a grouping of large-handwriting bifolia that's similar to Voynich A and Davis's Scribe 1. This is not surprising; for one, Davis herself has noted that her Scribe 1 has bigger, clearer, and more spacious writing than other scribes, whereas her Scribe 2 is more "cramped."
The results across Voynich B are more mixed. Davis's Scribe 4 cleanly separates out based on the aspect ratio—but because those pages are folios 67-73, a nonzero number of words are written vertically or at non-horizontal angles, which would therefore lower these sections' average aspect ratio relative to other portions of the VMS. There's substantial overlap between Davis's Scribes 2 and 3, and I don't recover a clean, consistent grouping of the two bifolia associated with Davis's Scribe 5.
I interpret these results as evidence that there are nonzero handwriting differences between Voynich A and Voynich B, consistent with both Currier and Davis's previous findings. This analysis does not recover clean distinctions among Davis's Scribes 2, 3, and 5—but that doesn't mean that inter-scribal differences don't exist. It just means that in the average sizing of their Voynichese glyphs and the average proportions of their lettering, these putative scribes don't cleanly separate out from one another. In addition, the distinctiveness of Scribe 4 within this analysis can be ascribed mainly to the presence of non-horizontal words within folios 67-73, so that would require follow-up.
Anyway, I see a lot of potential utility in this kind of approach.