The Voynich Ninja

Full Version: labels as words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6
(18-09-2016, 01:47 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.This is exactly what i am trying to do for several months (last year)
also i find it incredible that Marco did the exact same things I already did and published.

Hi David, if I actually duplicated your research, we now have the possibility of checking the correctness of our results.

Could you please post here your mapping of labels from each section to "paragraph words" of each section? I now examined You are not allowed to view links. Register or Login to view. you previously linked, but I cannot find anything directly comparable there.

One sentence that could be a useful comparison with the You are not allowed to view links. Register or Login to view. of this thread is:
Thus, 1110 label words do occur in the (other) text pages, that is 45% of all words in the label pages

You have the double of the labels I considered, I think mainly because you considered labels formed of multiple words. Given that difference, our results for the percentage of labels matched by words are rather close (mine is 47%). Excellent.
Sorry, i did not read these posts here earlier, i was frustrated, as you might have noticed in the duplication of work on one hand, and on the other hand there is so much work to be done on the other hand (such as splitting up pages and matrix comparing pages). But never mind, i make it a long lasting project, my expectations were a bit too high.    Concerning the info on my page, you are right: i removed the information and it is not publicly view-able anymore.

This thread "word as labels" is a fundamental thread and one of the pillars of text examination,
because if you consider that the labels are really an identification for the drawn things in the VMS,
then it could become rather easy when you actually have identified 1 or 2 labels. 
It's a fact that most unknown scripts were solved that way.

I assume the labels are repeated inside the text, but not always in the same form. 
See previous You are not allowed to view links. Register or Login to view..
(10-10-2016, 09:20 AM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.I assume the labels are repeated inside the text, but not always in the same form. 

Many are repeated in the same form and I agree that others might be repeated with different inflections.
Reading my own comments, of about 5-6 weeks ago. I realize that my insight and opinion about things is not as sturdy as I thought.

It is clear that this changes and shifts based on the gathered information, but is also infected by the research of that particular moment.

The research "words as tokens" has completely taken over, and shows an overview of the entire Voynich word structure.
In her discussion of You are not allowed to view links. Register or Login to view., Emma wrote:

Quote:The theory explains labelese—the different word statistics associated with label words—by putting such words effectively outside the transformation the text undergoes. If transformation works by altering words according to their environment then labels, which have no environment because they are usually isolated, should not be transformed. Labels are then the normal text. Of course, some labelese words are found in the main text, but there is no reason why every word in the transformed text must be altered, if the environment does not cause it.

Under the assumption that Voynichese is linguistic and phonetic, labels could be different from the main body of text in several ways, for instance:
  • Words in the text might be phonetically altered on the basis of the adjacent words. Labels have no adjacent words, so they are likely unaltered (as Emma suggests)
  • Labels might typically share several syntactic features. For instance, they might mostly be nouns in their singular form (and “nominative case”?)
  • Most of the Voynich labels apply to similar objects: nymphs and “small” plants. The limited range of the illustrated objects could correspond to a limited semantic range of the corresponding labels.

While I currently have no precise idea of how to analyze the differences between labels and the main text, I have collected a couple of simple diagrams that highlight some of the superficial differences of the labels.

This histogram compares the 25 most common prefixes for all the words in the ms with the corresponding percentages for the words that appear in labels (I have included multi-word labels, counting each word individually). I have simply considered the last two EVA characters, which of course causes some distortion (such as the high numbers for ch-,sh- which typically behave like digraphs).
[attachment=1591]

The main differences are that:
  • qo-, sh-, ch- are less frequent in labels
  • the o- prefix is more frequent in labels
  • the y- prefix is more frequent in labels (but much less clearly so than o-)


This is the corresponding histogram for the 25 most common suffixes.
[attachment=1592]
  • -ey,-in are less frequent in labels
  • -ry, -ly, -sy are much more frequent in labels

This diagram highlights something that has also been discussed by Emma: -y suffixes cannot be lumped all together.
Here are two similar histograms that only consider the “small-plants” / pharma folios 88, 89, 99, 100, 101, 102.

Prefixes:
[attachment=1593]

The main differences with respect to the global histogram in the previous post are: 
  • so-, sa- show a clearer correlation with labels.
  • ot- is rarer in the text and more frequent in the labels. In these pages, the frequency of ot- is 7 times greater in the labels than in the text.

Suffixes:
[attachment=1594]

The suffix -ol is quite popular in the text of this section, while its frequency in labels is not very different from the global measure.
The fact that the -ly, -ry, -sy endings are more common in labels is also apparent in these local data, as well as the fact that the suffixes -ey and -in are less common in the labels than in other text.
-am also is a frequent ending for labels in this section.
There are some interesting stats, Marco, and I've certainly never seen labelese so clearly described. Researchers often talk about how it differs and so it's good to see the numbers.

I'll add my thoughts, which may duplicate some of yours.

Word starts:

[qo]: it's no surprise to see this at lower levels, as this is the classic mark of labelese.
[ch, sh]: the lower levels of these two is interesting, as I can think of two different kinds of word which begin [ch, sh]. Some are short, like [chy] and [sheo], others are long with a gallows, such as [chckhy] and [chety]. I would guess it's the former which is changing the numbers and they're more common. But if so, what do these short words mean? Are they not in the labels because they're grammatical?
[ot, ok]: the increase in these should be a result of the lower [qo]. I note that for all labels the levels are similar and increase similarly, though for the small plants labels [ot] start much lower and grow much further.
[yt, yk, sa, so]: this is the biggest surprise! These are line start patterns, so to see them here is interesting. This is a link between labels and line patterns, though it's hard to see immediately how.

Word ends:
[in]: I seem to recalls discussing with you the possibility that [in] was a line pattern. We couldn't come to an agreement because the evidence was weak, I think. But there's the further fact that multisyllable words tend not to end with [in].
[ey]: I admit to not knowing what the lower levels of [ey] could mean, except that longer words also don't end with [ey].
[ly, ry]: it might seem that these endings account for the lower [ol, or] by the addition of [y], but I think more of these are [aly, ary].
(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.There are some interesting stats, Marco, and I've certainly never seen labelese so clearly described. Researchers often talk about how it differs and so it's good to see the numbers.

Thank you, Emma! Actually, I understood from these diagrams that I was thinking the whole ms has the statistics that truly belong to the labels. For instance, I somehow thought that o- words were more common than the sum of ch- and sh- words.

(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'll add my thoughts, which may duplicate some of yours.

Word starts:

[qo]: it's no surprise to see this at lower levels, as this is the classic mark of labelese.

Yes, this is a true cornerstone Smile

(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.[ch, sh]: the lower levels of these two is interesting, as I can think of two different kinds of word which begin [ch, sh]. Some are short, like [chy] and [sheo], others are long with a gallows, such as [chckhy] and [chety]. I would guess it's the former which is changing the numbers and they're more common. But if so, what do these short words mean? Are they not in the labels because they're grammatical?

I think I will draw another version of the charts using unique words (i.e. counting each word once, not the number of occurrences). The difference could tell us how much the short / high-frequency words impact.

(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.[ot, ok]: the increase in these should be a result of the lower [qo]. I note that for all labels the levels are similar and increase similarly, though for the small plants labels [ot] start much lower and grow much further.

I agree with the idea of a relation with the rarity of qo- in the labels. The strong correlation between initial o- and labels reminds me of the al- definite article in the labels of Arabic manuscripts, as discussed by Stephen Bax (and others, I guess); but of course it might be something entirely different.

(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.[yt, yk, sa, so]: this is the biggest surprise! These are line start patterns, so to see them here is interesting. This is a link between labels and line patterns, though it's hard to see immediately how.

I hadn't thought of line start patterns. That's an excellent observation!

(20-08-2017, 01:14 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Word ends:
...
[ly, ry]: it might seem that these endings account for the lower [ol, or] by the addition of [y], but I think more of these are [aly, ary].

I confirm these labels are about 60% [-aly, -ary], 30% [-oly, -ory], 10% other characters.
Here are the global histograms based on unique words (ignoring the number of occurrences). The ch- / sh- difference between labels and text is indeed somehow reduced, but it's still clearly visible.
The following are prefix / suffix histograms for two XV-XVI Century scientific texts:

1) The Alchemical Herbal (Italian version) Biblioteca Queriniana ms B.V.24, Brescia (XV Century). The text sample is You are not allowed to view links. Register or Login to view. of the two pages I have images of. The labels are the complete You are not allowed to view links. Register or Login to view. provided by Philip Neal (published by Ragazzini, Segre Rutz etc.)

2) Matthioli's “Commentarii in libros sex Pedacii Dioscoridis de medica materia” (1554). I have used the transcription published You are not allowed to view links. Register or Login to view.. You are not allowed to view links. Register or Login to view. doesn't exactly have “labels,” I have used instead the titles of each chapter / engraving.
In order to simplify the comparison, I have based the histogram on the single initial/final character only. I have produced the corresponding histograms for the Voynich “pharma” section.
The number of occurrences of distinct words was ignored: each word-form was counted once.

When examining the graphs, please take note of the vertical scale: the ranges vary considerably.

Comments:

Italian alchemical herbal (Brescia  B.V.24)

[attachment=1603]

Prefixes: the text has a rather uniform distribution, with limited differences between frequencies. For labels, frequencies cover a slightly wider range. In several cases (r g b t c) the frequency of the labels is considerably higher than that of the text. In other cases (v o d) the difference is in the opposite direction. Clearly, alchemical plant names have preferred initials that don't exactly match those of the Italian language. Yet the differences are of about 5% or less.

Suffixes: the text clearly shows the marked preference of Italian for ending vowels. Note that the scale of this histogram is about 4 times that of the prefixes. The labels show an even greater preference for the -a ending (in Latin and Italian, plant names tend to be feminine and end with -a). -i and -e endings typically correspond to plural forms (masculine and feminine respectively) and are markedly under-represented in the labels. The alchemical herbal includes several plants with Latin-like or Greek-like names (Basiles, Caspitres, Tofanas): this causes the isolated spike for the suffix -s in labels only.


Latin Matthioli

[attachment=1604]

Prefixes: the range of the prefix histogram is limited in this case also. The distribution of frequencies is regular and the differences between labels and text are even less marked than in the alchemical herbal. The greatest difference is the higher frequency of i- in the text with respect to the labels: this seems to be due to numerous medical terms originating from the “in” preposition or using “in” as a negative prefix (e.g. includit, infectum, infusione, ingestus, insanabilia, inspexisse). The fact that the a- prefix is more common in labels could be due to the Greek origin of many of the names (17% of Greek words start with alpha); Greek names could also have an influence on the relative rarity of i- in the labels.

Suffixes: the numbers vary much more than for the prefixes. Latin features different suffixes correlated with specific functions. The endings that typically correspond to masculine (-s), neutral (-m) and feminine (-a and -s again) nouns in the nominative form are preponderant for both text and labels, but each of them occurs in the labels with about 10% higher frequencies. Endings that correlate with plural forms (-i), other noun cases (-o, -e) and verbs (-t, -r) are frequent in the text but almost absent in the labels.


Voynich Pharma / Small-Plants

[attachment=1605]

Prefixes: there are fewer prefixes than in Italian and Latin and the range of the frequencies is consequently higher. Two marked differences are apparent: the most common prefix o- is much more common in the labels (almost half of the labels start with o-); as is well known, the prefix q- is rather common in the text but almost completely absent in the labels.

Suffixes: Voynichese has even less common suffixes than prefixes (basically, only 8 characters). -y is by far the most common. The more detailed two-characters analysis I previously posted shows that -y endings (e.g. -ey and -ly) are differently distributed between labels and text, but this is not visible in this graph and will not be discussed here. If one only considers the last character, Voynichese doesn't show differences between labels and ordinary text.

_______

Comparison of Voynichese with Latin and Italian highlights the fact that the three “languages” behave in a markedly different way. The main difference between Latin and Italian is that Italian only has a limited number number of frequent ending characters (-i, -o, -e, -a). Latin has more endings, but for both languages the endings provide the clearest differences between text and labels.
On the other hand, Voynichese last letters are almost identically distributed in text and labels. First letters exhibit noticeable differences; these are not as wide as those that appear in Latin and Italian endings but the statistics for Voynichese o- and q- can be visually compared (for instance) with Italian -a and -i (Brescia ms).

In both the Latin and Italian examples it seems that the differences between plain text and labels are partly related with the labels being borrowed from different languages (with Italian borrowing from Latin and Greek and Latin borrowing from Greek). This is something I hadn't considered and that could also have an influence on the Voynichese statistics.
Pages: 1 2 3 4 5 6