The Voynich Ninja

Full Version: Matching “pharma” / “small plants” labels in context
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Sorry for digging up this old thread, but this is something I've been thinking about lately. It seems clear that for each set of labels, there is one corresponding paragraph. So we could expect the labels to return in their respective paragraph, but apparently they don't. 

Even when we assume a natural language solution though, there are reasons why a label may not return in the text:

  • The language is highly inflected. When words appear in a sentence, it is rarely in their unmodified "dictionary form".  For example in Latin, labels "x,y,z" would be in the nominative form. But a sentence like "against snake bites, use x, y, z", would require accusatives. In English, this would not be a problem, but in more inflected languages it means that label forms may not appear as such in the text.A simple plural may be enough to render a word form unrecognizable. Examples like English goose/geese or Dutch schip/schepen are now exceptions, but the change of vowel in the stem was once common and still is in other languages. Another such relic in Dutch is lam/lammeren, which almost triples the length of the stem just to form a plural. In languages where modifications to word forms are more systematic, this could present a challenge to label-matching that requires more ingenuity than just looking for the label in the text.
  • Since labels appear in proximity and an obvious relation to their text, they may not be repeated. If your labels are "banana, strawberry, grape", then the text may read "use the above ingredients for a delicious fruit salad". For more realistic examples, think something like "you can find these plants in mountainous terrain" or "these should be finely ground to extract oil" and so on. If the plants are grouped by property, then the text may describe this property without repeating the (clearly labelled) plants' names.
There are VM-specific dangers too. The text may be meaningless. Or unrelated to the images and labels. Or too different for us to find out which items of "vocabulary" are equivalent. The more I think about it, the less likely it seems that the text should repeat labels, especially since most of these pages are really crowded. They would probably want to avoid redundancy.

If we were certain that the labels are repeated in the text, these pages would be a dream come true. Each group of labels is clearly paired with a few lines of text, so we might be able to figure out how "labelese" modifies normal Voynichese. But on the other hand, precisely this close proximity of the labelled images to the text might make mention of the labels in the text redundant.


We do know that "labelese" has different properties than the normal text though, so I would not entirely exclude the possibility of some transformation. Would there be a brute-force method to detect whether there are transformation patterns between label groups and words from their respective paragraphs?
Hi Koen,
thank you for digging this out! I still find the subject fascinating, and your recent posts about Nicander have rekindled my interest.
I tried to recover the scripts I used four years ago and make the results somehow readable. I attach the results. Since I must have altered the code since then, I am not sure this is 100% consistent, but it should be close enough. I used the Takahashi transliteration.
Each label is compared with what apparently is the corresponding paragraph. This matching allows for some variation between the two words.
These are the main variations I allowed for:
  • a label matches any longer word that contains it
  • 'k' and 't' are treated as equivalent
  • all gallows are equivalent to the corresponding benched gallows
  • 'o', 'a' and  'y' (Stolfi's "circles") are treated as equivalent
  • 'm' is equivalent to 'r'
  • 'l' in the label can be absent in the matching word
These variations are partly arbitrary and partly based on analyses done by others (e.g. Stolfi, Emma). It can be argued that they are not enough, e.g. one could have a more aggressive treatment of word-endings, in order to have more chances to handle inflection. All in all, there are not so many "pharma" paragraph, and a careful manual analysis is a viable option in this case.

(14-03-2020, 11:15 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.The more I think about it, the less likely it seems that the text should repeat labels, especially since most of these pages are really crowded. They would probably want to avoid redundancy.

When discussing likelihood, there always is the problem of how to extimate it. One can ask himself: do labels in medieval manuscripts typically appear in the text? I think they do. An hypothesis could be that the VMS should behave like other manuscripts and one could then see if this is the case (it does not appear to be).

Another possibility is the pessimistic approach: since all attempts to find structure in Voynichese consistently fail, one can say that experiments like this are very unlikely to be successful. I am more and more convinced that it is so. There could be features of the text (e.g. qokedy qokeedy qokoy qotedy otedy qokedy qokedy) that suggest that avoiding redundancy was not the main concern of the author(s), but who knows?
When you write "o,a,y" you mean "o,a,9".
So the one "y" that looks like a "9"?

So they're definitely not the same.
"o and 9" have something in common when they're first in the word, but not otherwise. That's what my research tells me.
@Marco thank you for that PDF file. I think it illustrates very well the general trend that the pharma labels do not recur in the paragraphs next to them. One can quibble with the allowances you've made  (I would probably would have made all gallows equivalent, and made EVA n, m, and r all equivalent), but that's beside the point. Even with a lot of reasonable equivalencies, matches between labels and paragraphs are still rare.

I'm more and more open to the idea that if the VMS's text is meaningful, its meaning is purposely obscured. I remember someone (perhaps it was you?) recently demonstrating with statistics that medieval herbals in all languages show distinctive word distribution patterns at the paragraph and page levels, which are not matched at all in the VMS. So it's probably not an herbal, only designed to resemble one to mislead people from its real purpose.

Koen has a point. I think it's worth asking ourselves, "What use of writing in the Middle Ages would potentially feature a bullet list, where each bulleted phrase was accompanied by an 8~9 line paragraph that seldom ever included the bulleted phrase?"

The PDF made it easy to scan down the list of transcribed labels in the order they occur in the VMS. When I did I noticed something: each label is often fairly similar to the one before it, with only 1~2 characters different. Has anyone looked for patterns in these transformations, over the course of the entire set of pharma labels, preferably in the order in which the pages were originally bound? If a numbering system or alphabetical order could be wrangled out of these labels, based on the order they're written in, that could be very valuable. I'm going to spend some time staring at these a bit more with a scratch pad next to me, and see what I can discern.
(15-03-2020, 05:17 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.@Marco thank you for that PDF file. I think it illustrates very well the general trend that the pharma labels do not recur in the paragraphs next to them. One can quibble with the allowances you've made  (I would probably would have made all gallows equivalent, and made EVA n, m, and r all equivalent), but that's beside the point. Even with a lot of reasonable equivalencies, matches between labels and paragraphs are still rare.

Hi RenegadeHealer,
I agree with what you write, both about the rules I applied not being optimal and about my preliminary results not giving much hope for success. As you also point out, it is still possible that there are other more subtle patterns to be found. This is an area I could come back to in the future and I also hope that others will look into it: it is the most structured part of the VMS and, as I said, it is small enough to make non-computational analysis quite approachable.
Pages: 1 2 3