During the last couple of months or so, I have been slowly working on transcribing part of an abbreviated Latin manuscript, trying to preserve as much of the abbreviations as I could. The experiment is based on St. Gallen, Stiftsbibliothek You are not allowed to view links.
Register or
Login to view., in particular the Soliloquium by Bonaventura, which starts at p.181.
I encountered a number of difficulties, in particular, word spacings and differences between glyphs are not always clear-cut. I am not entirely happy with the result, but since this kind of work is rather boring, in particular when you have already done several "passes" over the text, I decided to stop here.
Currently, I have transcribed ten pages: a little more than 13000 characters, 2000 words.
I have processed the transcription in order to create a "clean" version, where I removed upper-case characters, only considered spaces according to the manuscript, removed punctuation, joined the two halves of words that were hyphened at line breaks.
These are the non-alphabetic symbols that I used:
^ missing ‘r’ - usually as superscript mirriored ‘c’ but also other shapes
? hooked curl, similar to ^ (typically ‘tr’ ligature)
~ 'wavy’ macron
_ macron
; comma-like truncated word ending
" superscript ‘v’ (typically in natVra)
) curl macron ligature
* beginning of superscript word ending
& loop or vertical bar marking truncation (e.g. -2& -rum, -t& -tis)
# double macron (for double ‘s’ in ‘esse’)
+ crossed p for ‘per’
| long-s
1 superscript ‘i’
2 both 'et' and round-r
3 for final ‘m’ but also |3 for 'sed'
4 arabic numeral (similar to
l in this ms)
9 con- / -us
These symbols appear in the transcription, but were removed from the clean version:
. basically the only punctuation used in the ms
, space missing in manuscript
% space added in manuscript (space in the clean version)
= hyphened word at line break (rejoined in the clean version)
: hyphen missing in manuscript (still split in the clean version)
{ } notes
< > text deleted / corrected in ms
For comparison, I have also edited You are not allowed to view links.
Register or
Login to view. (Bonaventura Opera 08 -1898- Opuscula Varia Ad Theologiam Mysticam Et Res Ordinis Fratrum Minorum Spectantia) in order to remove the greatest differences from the manuscript version (words were occasionally added / deleted / moved in the two texts). I have "cleaned" the printed edition similarly to what I did with the transcription.
Here is an example that illustrates the usage of most symbols (before "cleaning"):
[
attachment=3749]
I have overlaid TTR results to one of the plots You are not allowed to view links.
Register or
Login to view.. For reference, I have included two VMS sections and two "extreme" Latin texts. As one could expect, the printed edition of Bonaventura has a considerably lower TTR than the transcription, yet the difference is not as large as that between different languages, nor as that between very different styles in the same language. With respect to W=1000, TTR differences are:
transcription-printed = 0.11
Virgil-Vulgate = 0.27
VMS_Q20-VMS_Q13 = 0.14
[
attachment=3750]
In the plot on the right, I have shifted the three transcription samples (Bonaventura, Q20, Q13) so that the transcription of Bonaventura matches the printed edition. The result is that VMS samples, which already were at the bottom of the Latin cloud, are moved towards the bottom of the whole cloud, near English, for instance.
My impression is that this correction is likely to be excessive: Voynichese appears to be more regular than the cursive script of Sang.942 and the number of glyphs appear to be smaller than in a regular alphabet, while an abbreviation system adds to the alphabet. Also, some of the additional variability may be due to my own inconsistencies in the transcription.
Anyway, I think we should consider the possibility that TTR measured on printed texts is reduced by the "normalization" process that takes place when abbreviations and scribal inconsistencies are removed.