The Voynich Ninja

Full Version: What are the characteristics of Labelese?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(28-07-2019, 05:39 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(26-07-2019, 11:58 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Isn't it natural that labels should consist of a lot of unique words (if they make sense) ? Of course, the author(s) could have "simulated" this property, but I think that's unlikely.

Indeed, if the text of the MS is meaningful, one can expect the label words to be mostly nouns, adjectives or also numbers. They would also be expected to be largely (or even completely) unique.

I consider that the observed behaviour of the labels  is not quite proof that the MS has meaning, but it shows that there is some 'intention' behind it. The text is clearly not just arbitrary filler.


Yes, this is almost the same as what I wrote, but I would like to add for everybody the observation that it is not necessary that the VMS-labels are labels as you define them by: "nouns, adjectives or also numbers". The labels could also be just a normal piece of text.  For example: here......................is..........................she:...........the.....person..........that...........I gave...........the red rose. Where the text is thrown across the page with images.
But TTR values argue against exactly this scenario. The percentage of unique words is much higher in labelese than in any normal text. Only texts like glossaries / dictionaries contain more unique words.
This histogram compares the frequency of the last character in three sets of Latin words:
  1. the text of You are not allowed to view links. Register or Login to view.
  2. the upper-case titles (pseudo-labels) in Matthioli's first book
  3. the labels in the image of You are not allowed to view links. Register or Login to view.
[attachment=3101]

The second and third sets are mostly made of nouns in the nominative case. The Passion set is clearly too small to provide meaningful statistics. What is peculiar in this set is that it contains several plural nouns: this is also clear from the illustrations, with two identical objects appearing in a single cell. This is the reason why -i is more frequent in the Passion labels than in Matthioli (where almost all titles are made of singular nouns and adjectives).

The high value for -t in the plain text, unmatched in the labels, is due to particles like "et" "aut" "ut" and to verbs (the third person ending typically ends by -t).



This other histogram is about characters in the VMS (independent of character position inside words). Graphs for prefixes and suffixes were discussed You are not allowed to view links. Register or Login to view..

[attachment=3100]

K T P F stand for the benched-gallows
C S for the benches
I E for i and e sequences

The overall frequency of gallows appears to be close in the two sets.
On the other hand, -in is less frequent in labels.
Also benches, benched-gallows and e-sequences are less frequent in labels: since [e] is strongly correlated with benches and benched-gallows, it is not surprising that a reduction in the first two groups co-occurs with a reduction of its occurrences.



According to  Stolfi, the top 10 most frequent words are:
886 daiin
548 ol
515 chedy
462 aiin
437 shedy
403 chol
365 or
360 ar
348 chey
338 dar
totalling 4662 occurrences, about 12% of Voynichese tokens.

Using Zandbergen-Landini classification, in the more than 1000 words that appear in the labels, these 10 frequent words appear 14 times: 1.4%. One order of magnitude less than their average frequency.

In the about 850 single-word labels (i.e. ignoring labels that are made of more than a single word) there are only 2 occurrences of the top 10 words (0.2%):
<f73r.30,&Lz>    ar (a star-nymph in Sagittarius)
<f99r.35,@Lf>    dar (a plant label that Takahashi transcribes "dor")

One can speculate that maybe the labels are nouns/adjectives, while the top 10 most frequent words are some kind of "function words".
Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words.
(29-07-2019, 03:27 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words.
It's fascinating that the VM labels behave quite well according to what we'd expect of labels. 
For multi-word labels it's quite natural to mix word types. For example, a plant labelled "lily of the valley" combines two infrequent nouns with two very frequent function words.
Or the traditional labels in the older wheels of Fortune with kings; these are three single-word labels and one three-word label which includes a common verb:
  • regno
  • regnavi
  • regnabo
  • sum sine regno
(29-07-2019, 08:28 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.It's fascinating that the VM labels behave quite well according to what we'd expect of labels. 

I agree, but...
Something that doesn't fit the nice picture (I don't think this has been mentioned in this thread yet):
Many of the labels that occur more than once appear to refer to different things.


The most frequent repeating single-word labels:

otaly occurs 8 times as a label: 5 times as zodiac-nymph; once as a "bio" "stream"; twice as "pharma" small-plants.
Problems with this label were previously You are not allowed to view links. Register or Login to view..

otedy occurs 6 times: twice each as zodiac-nymph, "bio" pool/tube and "bio" nymph

oky occurs 6 times: 3 zodiac-nymphs; once at the center of the bottom-righ Rosettes (but this could be a short text crammed inside the picture); twice as "pharma" small-plants

okal occurs 6 times: as a red moon in f67r2; 4 zodiac-nymphs; once as a bathing-nymph or "pine-cone"

I attach the results of my search for repeating single-word labels. This was based on the transcription file by Zandbergen-Landini.
Hi Marco, (I've emailed you some pages last week, probably it's in your spam folder)

Currently I am working on another passion manuscript,  and your first graph is comparing random things, 
words, with no relation whatsoever, it does not really show anything, to me. sorry.


The second graph is something, but really, coming to a wording such as you wrote:

"One can speculate that maybe the labels are nouns/adjectives, while the top 10 most frequent words are some kind of "function words".  Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words."

makes me wonder, if this is a discussion that can be satisfactory.

I've looked up my old stuff, and here I compared a.o. the labels, the letters, their position in the words, with other textual possibilities.
Look for the graph that says: CAB labels and compare it to the previous graph, CAB language or a NST, it does not really matter. You will see all the letters and their positions. These are similar.  You are not allowed to view links. Register or Login to view. 


I repeat that the one thing does *not* rule out the other thing here. You can not assume nouns or verbs are inside labels or not.
Letter occurrences on itself, are not really a measure for a noun or any word. Furthermore, words, or their positions or their occ. are also *not* a basis on which one can draw conclusions on either of them.

At the max you can say, the reversed thing: if the word-labels are nouns, and the text in them behave the same as the other text, and they have not been found partly or as a whole, explicitly in the other text. They, very probably,  have no direct word-relation in that same text scheme.

There could another textual relation, labels could be,  for example, use a table reference, use another encryption, of perhaps are nonsense a filler. But those are just examples.
You're right Marco. For multi-word labels this would be no problem (perhaps even expected, because adjectives like "large" appear in many contexts). But since they appear as single labels, they do point to an underlying problem.

Either the labels don't really (or don't always) name the thing they're with. Or else, this is a consequence of the way Voynichese encodes its language, perhaps creating an abnormal amount of homographs.
Quick list of ways a single label could apply to multiple things:

Attributes / properties   
Relationships / mappings
Numeric / quantities     
Multiple meanings 
Multiple encryptions       
Multiple languages       
Multiple sounds/Homonymy   
Categories
Metadata
Nonsense

(List derived from ideas by VViews, Anton, Koen G, Marco P, Davidsch ).
(30-07-2019, 05:23 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Quick list of ways a single label could apply to multiple things:

Attributes / properties   
Relationships / mappings
Numeric / quantities     
Multiple meanings 
Multiple encryptions       
Multiple languages       
Multiple sounds/Homonymy   
Categories
Metadata
Nonsense

(List derived from ideas by VViews, Anton, Koen G, Marco P, Davidsch ).


Bummer. I've been saying all that for years and I don't get credit?    Dodgy Confused Rolleyes Tongue
(30-07-2019, 11:19 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.You're right Marco. For multi-word labels this would be no problem (perhaps even expected, because adjectives like "large" appear in many contexts). But since they appear as single labels, they do point to an underlying problem.


Either the labels don't really (or don't always) name the thing they're with. Or else, this is a consequence of the way Voynichese encodes its language, perhaps creating an abnormal amount of homographs.

I used to think that the presence of many homographs might explain some of the weirdness of Voynichese, but now I am not so sure.

Single-word labels with apparently different meanings are quite frequent, something like 10% of the whole set. I believe that such a rate of homographs is unlikely in general: most sentences would he hopelessly ambiguous. Moreover, if 10% of the times a single word-type is used to express two or more different concepts, TTR would be significantly reduced, which does not seem to be the case.
Your TTR research points out that Voynichese behaves like an inflected language: it has more types per N tokens than little-inflected languages like English or Italian. Also in comparison with an inflected language like Greek, Voynichese tends to have a higher TTR.
[Image: ttr1.jpg?w=616]

This leaves the other option you mentioned:
the labels don't really (or don't always) name the thing they're with
In this case, the most obvious difficulty for me is that labels in other manuscripts work as expected: there is not a great overlap (if any) among single-word labels in herbals and bestiaries, and the labels mostly are plant and animal names respectively. Once again, finding parallels could suggest explanations that are somehow plausible, rather than merely possible.
Pages: 1 2 3 4 5 6 7 8 9 10