14-03-2020, 11:15 PM
Sorry for digging up this old thread, but this is something I've been thinking about lately. It seems clear that for each set of labels, there is one corresponding paragraph. So we could expect the labels to return in their respective paragraph, but apparently they don't.
Even when we assume a natural language solution though, there are reasons why a label may not return in the text:
If we were certain that the labels are repeated in the text, these pages would be a dream come true. Each group of labels is clearly paired with a few lines of text, so we might be able to figure out how "labelese" modifies normal Voynichese. But on the other hand, precisely this close proximity of the labelled images to the text might make mention of the labels in the text redundant.
We do know that "labelese" has different properties than the normal text though, so I would not entirely exclude the possibility of some transformation. Would there be a brute-force method to detect whether there are transformation patterns between label groups and words from their respective paragraphs?
Even when we assume a natural language solution though, there are reasons why a label may not return in the text:
- The language is highly inflected. When words appear in a sentence, it is rarely in their unmodified "dictionary form". For example in Latin, labels "x,y,z" would be in the nominative form. But a sentence like "against snake bites, use x, y, z", would require accusatives. In English, this would not be a problem, but in more inflected languages it means that label forms may not appear as such in the text.A simple plural may be enough to render a word form unrecognizable. Examples like English goose/geese or Dutch schip/schepen are now exceptions, but the change of vowel in the stem was once common and still is in other languages. Another such relic in Dutch is lam/lammeren, which almost triples the length of the stem just to form a plural. In languages where modifications to word forms are more systematic, this could present a challenge to label-matching that requires more ingenuity than just looking for the label in the text.
- Since labels appear in proximity and an obvious relation to their text, they may not be repeated. If your labels are "banana, strawberry, grape", then the text may read "use the above ingredients for a delicious fruit salad". For more realistic examples, think something like "you can find these plants in mountainous terrain" or "these should be finely ground to extract oil" and so on. If the plants are grouped by property, then the text may describe this property without repeating the (clearly labelled) plants' names.
If we were certain that the labels are repeated in the text, these pages would be a dream come true. Each group of labels is clearly paired with a few lines of text, so we might be able to figure out how "labelese" modifies normal Voynichese. But on the other hand, precisely this close proximity of the labelled images to the text might make mention of the labels in the text redundant.
We do know that "labelese" has different properties than the normal text though, so I would not entirely exclude the possibility of some transformation. Would there be a brute-force method to detect whether there are transformation patterns between label groups and words from their respective paragraphs?