I haven't been able to share the slides of the presentation I did at Voynich Day since it is 40 MB. This is because I recorded the audio into the slides themselves so I have to work out how to undo that. I'm also working on how to explain the work better but in the meantime, here's a summary of the presentation.
The aim was about outlining line patterns beyond immediately apparent ones such as gallows usually being the paragraph start initial; p and f appearing mostly on the Top Row; final m being disproportionately at Line End; initial a and ch rarely being at Line Start, etc. Do we see other distinct behaviour in initials and finals (and sometimes word-middle) at Line Start, Line End, and Top Row? Does this vary by scribe or do we see similar trends?
Line Behaviour at Key Positions (Line Start, Line End, Top Row)
My method is rather unsophisticated compared to Emma's; it's just comparing percentages. For the line patterns, firstly I took three large separate chunks for comparison:
Herbal A by Lisa's Scribe 1;
Balenological by Lisa's Scribe 2; and
Stars by Lisa's Scribe 3. This was both to be able to confirm when there is/isn't a cross-scribal tendency but also to reduce distortions in the stats where scribes might cancel each other out, etc.
I also split the text according to its position: for each scribal section, I separated top row words, line start words, line end words, and the "pure middle" (though it does include the bottom row mid-line words) from each other. This was to help spot any "Top Row effects", "Line End effect", etc, and also limit them interfering with each other.
So for example, ch is 24% of middle initials in Herbal A. Ceteris paribus, we'd expect to see it be 24% of LS initials, which would be about 300 ch. We see only 64, so there are over 200 "missing" instances. This makes initial ch "averse" to Line Start.
General issues that could distort/affect statistics include uncertain word breaks (likely especially affecting LE initials); transcription errors; choosing to focus on the scribal level rather than the folio or even smaller level; and my general incompetence.
The three scribes did often show similar tendencies, even if the size of their gaps can vary. Key similarities include for initials:
- At Line Start: all three hate initial ch, and are attracted to initial y, d, s, hence the clusters ych, dch, etc. Simultaneously, all were attracted to /ch/ and /sh/ as word-middle glyphs, and /ai/ and /ee/
- At Line End: All were averse to initial q, ch, and sh.
- At Top Row (middle): All were averse to initial ch and attracted to initial o. Clusters like /opch/ are common.
- At Paragraph End: All were averse to initial q, and prefer initial ch
For finals:
- At Line Start: all three were averse to final y and attracted to final n
- At Line End: all averse to final y and attracted to final m
- At Top Row (middle): all were attracted to final y and averse to final n.
Despite these similarities, there were also some striking differences. The top ones included:
- At Line Start: Scribe 1 in Herbal A is attracted to initial o and q. But Scribe 2 and Scribe 3 are averse to initial o (with some variance in the subclusters), and Scribe 3 in Stars hates initial q
- At Line End: Scribe 1 in Herbal A absolutely loves initial da (as Marco points out, this can reflect Patrick's findings of d becoming more prevalent as you go rightward). Meanwhile, Scribe 2 and Scribe 3 are developing a fondness for rare clusters often beginning with l, r, etc and we see words that are either wholly or mostly exclusive to the Line End position.
- For finals, Scribe 1 in Herbal A really likes final or at Line Start. This isn't a passion particularly shared by the others, but Scribe 3 in Stars is a little attached to final r for Paragraph Start.
I tried some reckless mapping of "missing" glyphs to "surplus" glyphs. I won't reproduce this here. But sometimes there was a vague resemblance between the missing and surplus word types:
- Line Start is the obvious example. We see tons of missing initial ch, and simultaneously tons of surplus word-middle ch in clusters like ych, dch etc.
- Top Row is similar. In its middle, initial ch vanishes, and simultaneously clusters like opch or qopch appear.
These may well be the same word types but the numbers don't always match up, and the finals are often different.
Other times, there is little resemblance between missing word types and surplus word types:
- At Line Start, Scribe 2 and Scribe 3 tend to have large deficits for initial oka/ota/oke/ote words (with some exceptions), and for the q version e.g. initial qoka type words. But we don't see any clear simple surplus word types at Line Start that could be replacing them in sufficient numbers like initial ych etc might replace initial ch. If they are replaced by different word forms with the same meaning, we'd need to look at more creative mutations like initial s or d, which carry large surpluses at Line Start.
- The Line End patterns mentioned above. Scribe 1 in HA's love affair with initial da words, while Scribe 2 and 3 are exploring initial l, r words like "lol"...yet the missing word types don't look very similar and so are hard to match.
"Vertical Impact Effect"
This was based on looking at how often glyphs are immediately under each other at Line Start. On folio 10r, you can see two lines - the 9th and 10th - that both start with o. I call this a "vertical pair" and denote it as o-o.
What's odd is that we rarely see this vertical pair in
Scribe 1's Herbal A. It's really odd since Scribe 1 loves starting lines with initial o (and the others hate it). My calculations (hopefully right) were that we should see over 40 o-o vertical pairs. Yet there's this one and...well you can check it out on Voynichese.com.
Simultaneously and suspiciously, we see a similarly sized surplus of o-q vertical pairs.
Scribe 1 in Herbal A really dislikes q-q vertical pairs despite loving q as a Line Start initial.
Scribe 2 in Balneological also shows a distaste for q-q. Both show a fondness for q-ch and q-sh, despite ch and sh being averse to Line Starts.
We also see
Scribe 1 in Herbal A being attracted to the y-o vertical pairs, and
Scribe 2 in Balneological liking d-q vertical pairs. And y in
Scribe 3 in Stars seems to like hanging out too much on the bottom row of paragraphs. There were other patterns but those were the ones I thought most worth highlighting.
This seemed really bizarre. It seems the lower glyph in the pair is conditioned on what the upper glyph is. Assuming lines were written in order, that is. Why might it occur?
- Is there an innate anti-duplication sentiment in the scribes where they hate reusing the same glyph immediately below another at Line Start (in the midline, there's more space to play around with)? But we don't see such a marked tendency with y-y or d-d. And s-s actually performs well.
- Is it about space saving or avoiding clashes with the glyphs below (e.g. q-t is messy)? But surely o-o is fine in this regard.
- Is it something to do with a role each glyph plays at Line Start? But what?
More general questions and thoughts
Does it also show scribal awareness, adapting to the circumstances and implied understanding of what they are writing? Or do the strong patterns imply there was a system of rules for them to blindly follow?
I couldn't think of a "natural", e.g. plaintext or linguistic reason for the Vertical Impact behaviour. They may well be at play for the other Line pattern behaviour above but it was hard for me to imagine they could be the main overarching cause.
Could the cause be that the text is meaningless? If meaningless, it doesn't really matter what glyphs are where. But these patterns would require a system with some strong and seemingly arbitrary rules, e.g.
"Avoid writing an o directly under another o at Line Start and write a q instead; avoid writing a q under another q and write ch/sh instead, etc"
The same thing would apply if we consider the Line Start initials to be meaningless nulls attached to the real word as part of a cypher hypothesis.
If they are not nulls, are they real? And does that mean the "shorter" word in the midline is an abbreviation, e.g. ychol becomes chol (which as Koen noted is a really weird way to abbreviate)
And lastly as part of the cypher paradigm, if the "mappings" reflect homophones and abbreviations, the apparent interchangeability of glyphs may pose the risk of running out of plaintext letters and making it illegible for even the authorized reader, unless we posit some further internal distinctions or external references.