![]() |
What are the characteristics of Labelese? - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: What are the characteristics of Labelese? (/thread-2862.html) |
RE: What are the characteristics of Labelese? - Davidsch - 29-07-2019 (28-07-2019, 05:39 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(26-07-2019, 11:58 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Isn't it natural that labels should consist of a lot of unique words (if they make sense) ? Of course, the author(s) could have "simulated" this property, but I think that's unlikely. Yes, this is almost the same as what I wrote, but I would like to add for everybody the observation that it is not necessary that the VMS-labels are labels as you define them by: "nouns, adjectives or also numbers". The labels could also be just a normal piece of text. For example: here......................is..........................she:...........the.....person..........that...........I gave...........the red rose. Where the text is thrown across the page with images. RE: What are the characteristics of Labelese? - Koen G - 29-07-2019 But TTR values argue against exactly this scenario. The percentage of unique words is much higher in labelese than in any normal text. Only texts like glossaries / dictionaries contain more unique words. RE: What are the characteristics of Labelese? - MarcoP - 29-07-2019 This histogram compares the frequency of the last character in three sets of Latin words:
The second and third sets are mostly made of nouns in the nominative case. The Passion set is clearly too small to provide meaningful statistics. What is peculiar in this set is that it contains several plural nouns: this is also clear from the illustrations, with two identical objects appearing in a single cell. This is the reason why -i is more frequent in the Passion labels than in Matthioli (where almost all titles are made of singular nouns and adjectives). The high value for -t in the plain text, unmatched in the labels, is due to particles like "et" "aut" "ut" and to verbs (the third person ending typically ends by -t). This other histogram is about characters in the VMS (independent of character position inside words). Graphs for prefixes and suffixes were discussed You are not allowed to view links. Register or Login to view.. K T P F stand for the benched-gallows C S for the benches I E for i and e sequences The overall frequency of gallows appears to be close in the two sets. On the other hand, -in is less frequent in labels. Also benches, benched-gallows and e-sequences are less frequent in labels: since [e] is strongly correlated with benches and benched-gallows, it is not surprising that a reduction in the first two groups co-occurs with a reduction of its occurrences. According to Stolfi, the top 10 most frequent words are: 886 daiin 548 ol 515 chedy 462 aiin 437 shedy 403 chol 365 or 360 ar 348 chey 338 dar totalling 4662 occurrences, about 12% of Voynichese tokens. Using Zandbergen-Landini classification, in the more than 1000 words that appear in the labels, these 10 frequent words appear 14 times: 1.4%. One order of magnitude less than their average frequency. In the about 850 single-word labels (i.e. ignoring labels that are made of more than a single word) there are only 2 occurrences of the top 10 words (0.2%): <f73r.30,&Lz> ar (a star-nymph in Sagittarius) <f99r.35,@Lf> dar (a plant label that Takahashi transcribes "dor") One can speculate that maybe the labels are nouns/adjectives, while the top 10 most frequent words are some kind of "function words". Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words. RE: What are the characteristics of Labelese? - Koen G - 29-07-2019 (29-07-2019, 03:27 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words.It's fascinating that the VM labels behave quite well according to what we'd expect of labels. For multi-word labels it's quite natural to mix word types. For example, a plant labelled "lily of the valley" combines two infrequent nouns with two very frequent function words. Or the traditional labels in the older wheels of Fortune with kings; these are three single-word labels and one three-word label which includes a common verb:
RE: What are the characteristics of Labelese? - MarcoP - 30-07-2019 (29-07-2019, 08:28 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.It's fascinating that the VM labels behave quite well according to what we'd expect of labels. I agree, but... Something that doesn't fit the nice picture (I don't think this has been mentioned in this thread yet): Many of the labels that occur more than once appear to refer to different things. The most frequent repeating single-word labels: otaly occurs 8 times as a label: 5 times as zodiac-nymph; once as a "bio" "stream"; twice as "pharma" small-plants. Problems with this label were previously You are not allowed to view links. Register or Login to view.. otedy occurs 6 times: twice each as zodiac-nymph, "bio" pool/tube and "bio" nymph oky occurs 6 times: 3 zodiac-nymphs; once at the center of the bottom-righ Rosettes (but this could be a short text crammed inside the picture); twice as "pharma" small-plants okal occurs 6 times: as a red moon in f67r2; 4 zodiac-nymphs; once as a bathing-nymph or "pine-cone" I attach the results of my search for repeating single-word labels. This was based on the transcription file by Zandbergen-Landini. RE: What are the characteristics of Labelese? - Davidsch - 30-07-2019 Hi Marco, (I've emailed you some pages last week, probably it's in your spam folder) Currently I am working on another passion manuscript, and your first graph is comparing random things, words, with no relation whatsoever, it does not really show anything, to me. sorry. The second graph is something, but really, coming to a wording such as you wrote: "One can speculate that maybe the labels are nouns/adjectives, while the top 10 most frequent words are some kind of "function words". Anyway (unless I have made some major mistake) we can say that the set of one-word labels is almost totally disjoint from the set of the 10 most frequent words." makes me wonder, if this is a discussion that can be satisfactory. I've looked up my old stuff, and here I compared a.o. the labels, the letters, their position in the words, with other textual possibilities. Look for the graph that says: CAB labels and compare it to the previous graph, CAB language or a NST, it does not really matter. You will see all the letters and their positions. These are similar. You are not allowed to view links. Register or Login to view. I repeat that the one thing does *not* rule out the other thing here. You can not assume nouns or verbs are inside labels or not. Letter occurrences on itself, are not really a measure for a noun or any word. Furthermore, words, or their positions or their occ. are also *not* a basis on which one can draw conclusions on either of them. At the max you can say, the reversed thing: if the word-labels are nouns, and the text in them behave the same as the other text, and they have not been found partly or as a whole, explicitly in the other text. They, very probably, have no direct word-relation in that same text scheme. There could another textual relation, labels could be, for example, use a table reference, use another encryption, of perhaps are nonsense a filler. But those are just examples. RE: What are the characteristics of Labelese? - Koen G - 30-07-2019 You're right Marco. For multi-word labels this would be no problem (perhaps even expected, because adjectives like "large" appear in many contexts). But since they appear as single labels, they do point to an underlying problem. Either the labels don't really (or don't always) name the thing they're with. Or else, this is a consequence of the way Voynichese encodes its language, perhaps creating an abnormal amount of homographs. RE: What are the characteristics of Labelese? - RobGea - 30-07-2019 Quick list of ways a single label could apply to multiple things: Attributes / properties Relationships / mappings Numeric / quantities Multiple meanings Multiple encryptions Multiple languages Multiple sounds/Homonymy Categories Metadata Nonsense (List derived from ideas by VViews, Anton, Koen G, Marco P, Davidsch ). RE: What are the characteristics of Labelese? - -JKP- - 31-07-2019 (30-07-2019, 05:23 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Quick list of ways a single label could apply to multiple things: Bummer. I've been saying all that for years and I don't get credit? ![]() ![]() ![]() ![]() RE: What are the characteristics of Labelese? - MarcoP - 01-08-2019 (30-07-2019, 11:19 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.You're right Marco. For multi-word labels this would be no problem (perhaps even expected, because adjectives like "large" appear in many contexts). But since they appear as single labels, they do point to an underlying problem. I used to think that the presence of many homographs might explain some of the weirdness of Voynichese, but now I am not so sure. Single-word labels with apparently different meanings are quite frequent, something like 10% of the whole set. I believe that such a rate of homographs is unlikely in general: most sentences would he hopelessly ambiguous. Moreover, if 10% of the times a single word-type is used to express two or more different concepts, TTR would be significantly reduced, which does not seem to be the case. Your TTR research points out that Voynichese behaves like an inflected language: it has more types per N tokens than little-inflected languages like English or Italian. Also in comparison with an inflected language like Greek, Voynichese tends to have a higher TTR. ![]() This leaves the other option you mentioned: the labels don't really (or don't always) name the thing they're with In this case, the most obvious difficulty for me is that labels in other manuscripts work as expected: there is not a great overlap (if any) among single-word labels in herbals and bestiaries, and the labels mostly are plant and animal names respectively. Once again, finding parallels could suggest explanations that are somehow plausible, rather than merely possible. |