The following are prefix / suffix histograms for two XV-XVI Century scientific texts:
1) The Alchemical Herbal (Italian version) Biblioteca Queriniana ms B.V.24, Brescia (XV Century). The text sample is You are not allowed to view links.
Register or
Login to view. of the two pages I have images of. The labels are the complete You are not allowed to view links.
Register or
Login to view. provided by Philip Neal (published by Ragazzini, Segre Rutz etc.)
2) Matthioli's “Commentarii in libros sex Pedacii Dioscoridis de medica materia” (1554). I have used the transcription published You are not allowed to view links.
Register or
Login to view.. You are not allowed to view links.
Register or
Login to view. doesn't exactly have “labels,” I have used instead the titles of each chapter / engraving.
In order to simplify the comparison, I have based the histogram on the single initial/final character only. I have produced the corresponding histograms for the Voynich “pharma” section.
The number of occurrences of distinct words was ignored: each word-form was counted once.
When examining the graphs, please take note of the vertical scale: the ranges vary considerably.
Comments:
Italian alchemical herbal (Brescia B.V.24)
[
attachment=1603]
Prefixes: the text has a rather uniform distribution, with limited differences between frequencies. For labels, frequencies cover a slightly wider range. In several cases (r g b t c) the frequency of the labels is considerably higher than that of the text. In other cases (v o d) the difference is in the opposite direction. Clearly, alchemical plant names have preferred initials that don't exactly match those of the Italian language. Yet the differences are of about 5% or less.
Suffixes: the text clearly shows the marked preference of Italian for ending vowels. Note that the scale of this histogram is about 4 times that of the prefixes. The labels show an even greater preference for the -a ending (in Latin and Italian, plant names tend to be feminine and end with -a). -i and -e endings typically correspond to plural forms (masculine and feminine respectively) and are markedly under-represented in the labels. The alchemical herbal includes several plants with Latin-like or Greek-like names (Basiles, Caspitres, Tofanas): this causes the isolated spike for the suffix -s in labels only.
Latin Matthioli
[attachment=1604]
Prefixes: the range of the prefix histogram is limited in this case also. The distribution of frequencies is regular and the differences between labels and text are even less marked than in the alchemical herbal. The greatest difference is the higher frequency of i- in the text with respect to the labels: this seems to be due to numerous medical terms originating from the “in” preposition or using “in” as a negative prefix (e.g. includit, infectum, infusione, ingestus, insanabilia, inspexisse). The fact that the a- prefix is more common in labels could be due to the Greek origin of many of the names (17% of Greek words start with alpha); Greek names could also have an influence on the relative rarity of i- in the labels.
Suffixes: the numbers vary much more than for the prefixes. Latin features different suffixes correlated with specific functions. The endings that typically correspond to masculine (-s), neutral (-m) and feminine (-a and -s again) nouns in the nominative form are preponderant for both text and labels, but each of them occurs in the labels with about 10% higher frequencies. Endings that correlate with plural forms (-i), other noun cases (-o, -e) and verbs (-t, -r) are frequent in the text but almost absent in the labels.
Voynich Pharma / Small-Plants
[attachment=1605]
Prefixes: there are fewer prefixes than in Italian and Latin and the range of the frequencies is consequently higher. Two marked differences are apparent: the most common prefix o- is much more common in the labels (almost half of the labels start with o-); as is well known, the prefix q- is rather common in the text but almost completely absent in the labels.
Suffixes: Voynichese has even less common suffixes than prefixes (basically, only 8 characters). -y is by far the most common. The more detailed two-characters analysis I previously posted shows that -y endings (e.g. -ey and -ly) are differently distributed between labels and text, but this is not visible in this graph and will not be discussed here. If one only considers the last character, Voynichese doesn't show differences between labels and ordinary text.
_______
Comparison of Voynichese with Latin and Italian highlights the fact that the three “languages” behave in a markedly different way. The main difference between Latin and Italian is that Italian only has a limited number number of frequent ending characters (-i, -o, -e, -a). Latin has more endings, but for both languages the endings provide the clearest differences between text and labels.
On the other hand, Voynichese last letters are almost identically distributed in text and labels. First letters exhibit noticeable differences; these are not as wide as those that appear in Latin and Italian endings but the statistics for Voynichese o- and q- can be visually compared (for instance) with Italian -a and -i (Brescia ms).
In both the Latin and Italian examples it seems that the differences between plain text and labels are partly related with the labels being borrowed from different languages (with Italian borrowing from Latin and Greek and Latin borrowing from Greek). This is something I hadn't considered and that could also have an influence on the Voynichese statistics.