The Voynich Ninja
Multi-word labels - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Multi-word labels (/thread-2924.html)



Multi-word labels - MarcoP - 10-09-2019

In You are not allowed to view links. Register or Login to view., focusing on the specific subset of zodiac labels, Rene observed that words that are very common in the text do not appear in labels.
In You are not allowed to view links. Register or Login to view., considering the whole set of all labels in the manuscript, I observed that common words are very rare in single-word labels, but more frequent in multi-word labels (bottom of this post).

In this thread, I will try to collect some more information about multi-word labels. It is not a huge set, so it should be relatively easy to examine it in detail. My ambition is to also consider other manuscripts, seeing what readable labels can suggest about the labels in the VMS. I expect I will make errors,  but I am sure that the subject is interesting and maybe others will carry on the task with more accuracy. Rene's approach of focusing on a single section is likely more robust, but for this first post I considered all the labels together. Separately examining label subsets is one of the things I hope to do in later posts.

In order to have a first readable manuscript to compare with, I have transcribed the labels in You are not allowed to view links. Register or Login to view. (see attachment).  I encountered a number of difficulties in the process:

* I opted for expanding all the abbreviations I could interpret, so that the text in the labels can be compared with normal Latin. This was not an easy choice, and I understand that a diplomatic transcription would have had its advantages too.
* The labels are written in several different hands. I only considered those that seemed to me reasonably close to the hand who wrote the main text.
* Labels are often interrupted by the image of a plant. I considered the two halves as a single label.
* Labels sometimes occur on two lines: I considered these as two distinct labels (I understand that this is also what Voynich transcriptions do).
* In a few cases, there are labels for illustrations that were never drawn. I still included these in my transcription.
* It is not always clear what should be considered a label. For instance, some include as many as seven words.

In total, 232 labels were transcribed.

I compared the set with the 1023 lines marked L (label) in the Zandbergen-Landini VMS transcription, both ignoring and considering uncertain spaces (commas). A handful of labels corresponding to special characters were ignored.

The following histogram compares the number of words in the sets of labels. The numbers are presented as percentages: keep in mind that the VMS has many more labels than Egerton 747. Single-word labels are about 888-926 (with and without uncertain spaces) vs 147 in Egerton 747. In the VMS, labels with 4 or more words are extremely rare (at most 4 in total) and will not be discussed in this post.

   

I also examined the position of the most frequent word in multi-word labels. In order to assess the frequency of the text in Egerton 747, I typed about 1000 words from those parts of the transcription of the manuscript published by Iolanda Ventura that are available on You are not allowed to view links. Register or Login to view..

Of course such a small text is not enough to measure frequencies, in particular with Latin that has so many different word-types. So I relied on a composite lexicon made of the small fragment from Egerton 747 and more extensive parts of the Vulgate Bible, Virgil's Aeneid and Mattioli's commentary on Dioscorides, collecting a total number of words (tokens) comparable with that of the Voynich transcriptions (about 38,000 words).


Most common word in two word labels

The following is the position of the most frequent word for labels made of two words. The counts for two-word labels in the three sets are:

78 ZL VMS ignoring uncertain spaces
104 ZL VMS with uncertain spaces
47 Egerton 747

The undefined cases are those in which the two words have the same frequency (typically, both appear only once in the reference corpus).
   

While in the Tractatus the most frequent word tends to appear in the second position, in the VMS it tends to appear in the first. In the Latin text, the two positions typically correspond to different stypes of plant names:

* Most common word in the first position (the less frequent case). The name of the plat is made of a noun and an adjective,  both words make integral part of the plant's name. E.g.:
pes leporinus (hare's foot)
In other cases, the first word is the generic type of plant and the second word the specific name:
arbor abiete (arbor=tree, abiete=fir: fir tree)
herba vitis (herba=plant, vitis=grapevine: grapevine plant)

* Most common word in the second position (twice as frequent as the other case). Typically, the first word is a specific plant name; the second word is an adjective that identifies a variant of the plant "family". Examples:
centaurea maior (greatest centaurea)
papaver nigrum (black poppy)
vitis alba (white vine)
centaurea, papaver, vitis are plant names
maior, nigrum, alba are generic adjectives that are obviously more common words.

In most cases (68%), 2-word labels are made of a noun followed by an adjective: of course, both words are in the nominative case and agree in gender and number. Labels that consist of two nouns (6%) also are in the nominative case and tend to have the same gender. A few cases feature a first noun in the nominative case and a second one in the genitive (e.g. sponsa solis, the wife of the sun).

A possible task for future posts could be checking if 2-word labels in the VMS exhibit signs of extensive concordance (e.g. sharing the same suffix).


Most common word in three word labels


Counts of three-words labels:
16 ZL VMS ignoring uncertain spaces
27 ZL VMS with uncertain spaces
17 Egerton 747

The Tractatus only shows that the most common word tends not to occur at the end: it is frequent both in the first and central position.
   

* Most common word in the first position. 6 of the 8 occurrences follow the pattern "nomen herbe/herba X".

* Most common word in the central position. 4 of the 7 occurrences present two different names for  one plant, separated by a disjunction ("sive" / "vel" two occurrences each). 
f.18r.1 brusci sive bruscus
f.40r.2 fragia sive fragula
f.74v.1 tapinum vel pinea
f.92r.2 sauma vel brachteos

* Most common word in the last position. The only two cases actually are labels that are split on two lines (f.5r, f.52v).

The VMS shows a preference to present the most common word in the central position. It is tempting to speculate that this might be due to the presence of a disjunction, but this does not seem likely. The most obvious check is seeing if the central word in Voynich 3-word labels tends to be constant, or at least clearly biased towards a limited set of choices, but this is not the case. Among the 27 labels in the ZL-with-uncertain-spaces transcription, only two words repeatedly appear in the central position, and each only appears twice: ar and char. They also appear consecutively in one of the labels:
<f69r.9,&L0>    dcho char ar

Also, nothing as simple as the "nomen herbe ..." pattern seems to occur in Voynich 3-word labels.


RE: Multi-word labels - Koen G - 10-09-2019

To check the position of the most common word was a great idea, Marco. It makes me wonder whether this would be reversed in a Germanic text. In theory it should, since all those common adjectives like sizes and colors would go before the noun.

The lack of extremely common words as the middle word of three-word labels is something I find hard to explain from the reference point of the European languages I know, though it might not be unusual in other languages?


RE: Multi-word labels - MarcoP - 11-09-2019

(10-09-2019, 09:09 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.To check the position of the most common word was a great idea, Marco. It makes me wonder whether this would be reversed in a Germanic text. In theory it should, since all those common adjectives like sizes and colors would go before the noun.


Though I am totally ignorant of German, I think you are right. Browsing through You are not allowed to view links. Register or Login to view. of Fuchs' printed herbal, one can see that two-word titles appear to cluster together, exactly because they share the same initial word (klein, gross, rot, wilder...). These titles also occur as labels for You are not allowed to view links. Register or Login to view. in the book.
   


Also in Auslasser's manuscript herbal, the first word sometimes is a common adjective:
You are not allowed to view links. Register or Login to view.


But there are cases where the ending -kraut -wurtz is written as a separate word, producing labels where the first position is occupied by what should be the most specific, less common word.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

With a good deal of wishful thinking, one could say that, in early-modern herbal two-word labels, the position of the most common word correlates with noun-adjective languages (like Latin and related languages) vs adjective-noun languages (like German and English). With even more wishful thinking, one could speculate that the VMS appears to belong to the second category: but this really is a wild guess, since only a minority of Voynich labels are related with plants. More work is clearly needed.

Anyway, two-word labels are likely to be noun phrases. If Voynichese is written phonetically, or if it is a cipher where each word corresponds to a single source word, these labels could tell us something about the structure of the underlying language.

(10-09-2019, 09:09 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.The lack of extremely common words as the middle word of three-word labels is something I find hard to explain from the reference point of the European languages I know, though it might not be unusual in other languages?

Three-word labels are both more complex (since they obviously offer more possibilities) and rarer: analysis is even more difficult than for two-word labels. From what I have seen, in three-word labels, extremely common words are rare both in the VMS and in Egerton 747. The most common word that occurs in the middle word is "vel": it only occurs twice and it ranks 54th in the frequencies of my small Latin corpus.

In the VMS there seems to be an occurrence of You are not allowed to view links. Register or Login to view. (the most common word) in the middle of a three-word label.


RE: Multi-word labels - Koen G - 11-09-2019

I was thinking three word labels would be like "lords and ladies" or "x of y". But upon closer consideration, the second type would already be absent in Latin and the first type is rare in plant names.

A three-word label with daiin is interesting. Indeed, I think with enough data, multi-word label statistics could point to language type. But you're right that it's all quite complex. For example you have already shown that depending on the space usage, the first part (modifier) in Germanic languages can be a noun specifying another noun.