In You are not allowed to view links.
Register or
Login to view., focusing on the specific subset of zodiac labels, Rene observed that words that are very common in the text do not appear in labels.
In You are not allowed to view links.
Register or
Login to view., considering the whole set of all labels in the manuscript, I observed that common words are very rare in single-word labels, but more frequent in multi-word labels (bottom of this post).
In this thread, I will try to collect some more information about multi-word labels. It is not a huge set, so it should be relatively easy to examine it in detail. My ambition is to also consider other manuscripts, seeing what readable labels can suggest about the labels in the VMS. I expect I will make errors, but I am sure that the subject is interesting and maybe others will carry on the task with more accuracy. Rene's approach of focusing on a single section is likely more robust, but for this first post I considered all the labels together. Separately examining label subsets is one of the things I hope to do in later posts.
In order to have a first readable manuscript to compare with, I have transcribed the labels in You are not allowed to view links.
Register or
Login to view. (see attachment). I encountered a number of difficulties in the process:
* I opted for expanding all the abbreviations I could interpret, so that the text in the labels can be compared with normal Latin. This was not an easy choice, and I understand that a diplomatic transcription would have had its advantages too.
* The labels are written in several different hands. I only considered those that seemed to me reasonably close to the hand who wrote the main text.
* Labels are often interrupted by the image of a plant. I considered the two halves as a single label.
* Labels sometimes occur on two lines: I considered these as two distinct labels (I understand that this is also what Voynich transcriptions do).
* In a few cases, there are labels for illustrations that were never drawn. I still included these in my transcription.
* It is not always clear what should be considered a label. For instance, some include as many as seven words.
In total, 232 labels were transcribed.
I compared the set with the 1023 lines marked L (label) in the Zandbergen-Landini VMS transcription, both ignoring and considering uncertain spaces (commas). A handful of labels corresponding to special characters were ignored.
The following histogram compares the number of words in the sets of labels. The numbers are presented as percentages: keep in mind that the VMS has many more labels than Egerton 747. Single-word labels are about 888-926 (with and without uncertain spaces) vs 147 in Egerton 747. In the VMS, labels with 4 or more words are extremely rare (at most 4 in total) and will not be discussed in this post.
I also examined the position of the most frequent word in multi-word labels. In order to assess the frequency of the text in Egerton 747, I typed about 1000 words from those parts of the transcription of the manuscript published by Iolanda Ventura that are available on You are not allowed to view links.
Register or
Login to view..
Of course such a small text is not enough to measure frequencies, in particular with Latin that has so many different word-types. So I relied on a composite lexicon made of the small fragment from Egerton 747 and more extensive parts of the Vulgate Bible, Virgil's Aeneid and Mattioli's commentary on Dioscorides, collecting a total number of words (tokens) comparable with that of the Voynich transcriptions (about 38,000 words).
Most common word in two word labels
The following is the position of the most frequent word for labels made of two words. The counts for two-word labels in the three sets are:
78 ZL VMS ignoring uncertain spaces
104 ZL VMS with uncertain spaces
47 Egerton 747
The undefined cases are those in which the two words have the same frequency (typically, both appear only once in the reference corpus).
While in the Tractatus the most frequent word tends to appear in the second position, in the VMS it tends to appear in the first. In the Latin text, the two positions typically correspond to different stypes of plant names:
* Most common word in the first position (the less frequent case). The name of the plat is made of a noun and an adjective, both words make integral part of the plant's name. E.g.:
pes leporinus (hare's foot)
In other cases, the first word is the generic type of plant and the second word the specific name:
arbor abiete (arbor=tree, abiete=fir: fir tree)
herba vitis (herba=plant, vitis=grapevine: grapevine plant)
* Most common word in the second position (twice as frequent as the other case). Typically, the first word is a specific plant name; the second word is an adjective that identifies a variant of the plant "family". Examples:
centaurea maior (greatest centaurea)
papaver nigrum (black poppy)
vitis alba (white vine)
centaurea, papaver, vitis are plant names
maior, nigrum, alba are generic adjectives that are obviously more common words.
In most cases (68%), 2-word labels are made of a noun followed by an adjective: of course, both words are in the nominative case and agree in gender and number. Labels that consist of two nouns (6%) also are in the nominative case and tend to have the same gender. A few cases feature a first noun in the nominative case and a second one in the genitive (e.g. sponsa solis, the wife of the sun).
A possible task for future posts could be checking if 2-word labels in the VMS exhibit signs of extensive concordance (e.g. sharing the same suffix).
Most common word in three word labels
Counts of three-words labels:
16 ZL VMS ignoring uncertain spaces
27 ZL VMS with uncertain spaces
17 Egerton 747
The Tractatus only shows that the most common word tends not to occur at the end: it is frequent both in the first and central position.
* Most common word in the first position. 6 of the 8 occurrences follow the pattern "nomen herbe/herba X".
* Most common word in the central position. 4 of the 7 occurrences present two different names for one plant, separated by a disjunction ("sive" / "vel" two occurrences each).
f.18r.1 brusci sive bruscus
f.40r.2 fragia sive fragula
f.74v.1 tapinum vel pinea
f.92r.2 sauma vel brachteos
* Most common word in the last position. The only two cases actually are labels that are split on two lines (f.5r, f.52v).
The VMS shows a preference to present the most common word in the central position. It is tempting to speculate that this might be due to the presence of a disjunction, but this does not seem likely. The most obvious check is seeing if the central word in Voynich 3-word labels tends to be constant, or at least clearly biased towards a limited set of choices, but this is not the case. Among the 27 labels in the ZL-with-uncertain-spaces transcription, only two words repeatedly appear in the central position, and each only appears twice: ar and char. They also appear consecutively in one of the labels:
<f69r.9,&L0> dcho char ar
Also, nothing as simple as the "nomen herbe ..." pattern seems to occur in Voynich 3-word labels.