The Voynich Ninja

Full Version: Vord frequency histogram as an indicator of the text category
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
Hello Anton,
thanks for this info about the recipe section. I have been looking for parallels to it after reading Nick Pelling's "block paradigm" post about it, and this is very helpful information.
I have been wondering the same about the astronomical section: beyond the few paragraph-blocks of text in this section, does the first-vord uniqueness also appear if we take the markers on the circular text as indicating the beginning of "paragraphs" (I realize this is not possible for all of them)?
I think I may disclose my methodology in advance (additionally I may hopefully gather some advance critique). Actually why I've been bothering about first vords of folios was that I took that as a possible disproving criterion for the assumption that first vords of botanical folios represent plant names.

Indeed, if first vords of botanical folios are not plant names but some more commonplace words, they should exhibit low degree of uniqueness and quite high frequency count. On the contrary, if they are plant names, then it would be unlikely for them to have high frequency counts, because a plant name is a specific term not suitable for each and every context, let alone that there are 100+ Voynich plants. (What I mean is the integral picture; of course, certain "important" plants may be mentioned very frequently).

I have shown that the uniqueness of the first vords of botanical folios is 66%, while for the second, third and last vords of botanical folios uniqueness is twice that lower and is in the range of 33...35%. (Of course I could investigate fourth, fifth and so on words, ideally embracing all vords in all botanical folios, but that would have been very tedious, since I did everything manually).

Thus the first vord of a botanical folio looks something specific as compared to the rest of the folio. Of course, that does not explicitly prove that it stands for a plant's name (especially in each and every case), but that comfortably does not explicitly disprove that. This result is what I like, because the subsequent research (not explained in this post) is based on the assumption that first vords are plant names indeed.

However, to be consistent, one needs to compare the behaviour of the first vords of botanical folios with the behavior of the first vords of folios of other sections. If they exhibit a comparable degree of uniqueness, then it may mean one of the two things:

a) first vords of folios of other sections are also some specific terms, relevant to that sections;

OR:

b) first vords of folios owe their uniqueness not to their meaning, but to their position in the folio.

So I performed some counts as specified above and found that the first vords of folios in balneo and recipe sections are also highly unique. (BTW, the first vords of astro folios, where there are explicit paragraphs, do not seem to be highly unique, although I did only a basic screening).

To investigate the potential option b), we need to discern between folios and paragraphs. Indeed, the first vord of a folio is, at the same time, the first vord of the first paragraph in a folio. So what position matters (if any) - to be the first vord in a paragraph, or only to be the first vord in a folio? Like mentioned above, 2nd paragraphs of recipes yield high uniqueness of their first vords. So, if position matters, then it is the position in a paragraph, not the position in a folio. But does the position matter, after all? I currently checked botanical folios up to 10v (I will check them all shortly), and found that, all paragraphs taken together, their first vords exhibit only 50% uniqueness. This means that if we exclude first paragraphs, then second etc. paragraphs would exhibit quite low degree of uniqueness of their first vords - in contrast to the first paragraphs, the first vords of which are at the same time the first vords in folios.

To sum this up, if no evidence to the contrary is discovered during the forthcoming checks (I will not make all of them, at least for now, because currently my interest is limited to the botanical section, but maybe someone else kindly will), the following picture presents itself:

1) First vords of folios of botanical, balneo and recipe sections are highly unique, while first vords of astro folios are not.
2) First vords of paragraphs of the recipe section are highly unique.
3) First vords of second, third etc. paragraphs of folios in the botanical section are not highly unique.

What does this mean? Probably this: botanical and balneo folios generally open with some specific term, like, in the former case, the name of the plant. Each paragraph (I checked only the 1st and 2nd ones, all subsequent are just my guess - to be checked by someone else) of the recipe section generally opens with some specific term.

All this will be incorporated in the forthcoming second part of my "Contextual Analysis of Voynich Objects".


If your question about the circular charts in the astro section is answered, we could make some conclusions about that also. The difficulty is that we don't know if those represent coherent phrases or rather sets of disjoint vords, and also (as you note) that we don't always know where is the beginning of the phrase.
If you want to check whether the first words are unique plant names, wouldn't it be worthwhile to compare their uniqueness to that of labels in the pharma section? Those are almost certainly plant names so I'd expect a similar degree of uniqueness.
After your series of posts about mnemonics, I got increasingly uncertain about the labels in pharma section being plant names. The logic well may be that the plant names there are conveyed by means of mnemonics, while the labels convey something else, like modus operandi or associated objects. In an example that I provided in a parallel thread, "otol" is used at least three times as a label, each time for an object of a different nature.

An analysis that is worth performing though, is to check for pharma labels' appearance across the botanical section. As I briefly checked, not that all of pharma labels appear in the botanical section. If those were plant names, then we must admit that the author left some of plants strangely undescribed in the herbal portion of his opus.

One additionally complicating thing with pharma section is that in some places there are more labels than there are objects (e.g. f99v, third row of plants).
Anton. This example explains my assumption that these labels are the names of the ingredients. From the first and third plants must take in recipe two ingredient .
This is how I would assign the labels in this row. One label consists of two words (happens a lot, for example the name for cinnamon is "chinese wood" in many languages). The other is a continuation of the text that was moved to the "labels" line because a plant was blocking the way. An extra argument for this, is that it closely resembles one of the preceding words, which is very Voynichy. I marked these words with a blue, vertical "=". The colors have no meaning, I just used different ones for clarity.

[Image: attachment.php?aid=205]

Either way, I do understand why you'd not want to include the labels. They may behave differently than the text, and either way uncertainties like the one you mention certainly don't help.

(By the way, I'm quite happy someone takes at least a little bit of my work seriously... I was getting afraid that everyone thought I was some kind of crazy person  Tongue )

[attachment=205]
Quote:Either way, I do understand why you'd not want to include the labels. They may behave differently than the text, and either way uncertainties like the one you mention certainly don't help.

On the contrary, my work is supposed to be heavily based on labels, because they are likely to:

- represent distinct words or phrases (as opposed to the main text which we don't know if it's shuffled or otherwise manipulated)
- represent sets of "equal rank" notions - which is the key property that makes them so valuable.

The most valuable set is that of the Voynich "stars" (f68r1 and f68r2). It is a) complete in itself  and b) no labels do repeat within it.

By the way, 47% of all Voynich "star" labels are unique vords. That's what makes the set complete - it makes no sense to mention "useless" (from the perspective of the contents of the VMS) "stars" in the set but for the sake of the completeness of the latter.

(29-03-2016, 06:09 PM)Wladimir D Wrote: You are not allowed to view links. Register or Login to view.Anton. This example explains my assumption that these labels are the names of the ingredients. From the first and third plants must take in recipe two ingredient .

I also think of that in a similar way. But for the list of possible ingredients (root, leaf, seed, stem, petal... what else? ) the number of these labels is too large, is not it?

And these labels do not repeat across the pharma folios, if I am not mistaken. Is it probable that from each plant a different ingredient must be taken?
Since I view each character a word, then the variety labels I explain the following factors.
a. other than those listed you have a lot of clarifying elements of plants (pistil, stamen, pollen, pedicel, perianth ....) and juices from them. Extends the quantity of labels and the "o" and "a", the upper and the lower part of these elements.
Gallows K, T (indicating the size, quantity), and F, P describes (shape). The symbol "S" indicates the direction (the state). "oiS" - juice twisted upper leaves.
b. Some tags are themselves mixtures of juices "oram", "aram". Create a new combinations.
c. Juices may be filtered and with the pulp (at the end with "y")
d. tag can for example denote: dew from the leaves, collected during the full moon, or early in the morning. Smile


That that in the "pharma" does not have many labels I explain by the fact that there is no necessity of preliminary processing  of plant for these recipes (completely used).
Quote:What occurs to me is that the distribution of vord frequency count would speak about the type of the underlying message rather than of the underlying language.

I was obviously wrong in this. Grammar rules of different languages, like declension and conjugation, will have a significant impact on the word frequency distribution.
Well, I calculated the distribution using the Voynich Reader tool and Takahashi's transcription, excluding "dubious" words from calculation.

What I see seems a bit weird at a glance - of 6818 Voynichese words 6029 (or 88%) occur only 5 times or less. 6359 Voynichese words (or 93%) occur 10 times or less.

Only 55 Voynichese words occur more than 100 times.

Is that OK for any cohesive text, even if we mind that there might be different word forms?

This looks like a very limited basic vocabulary with a high amount of specific terms?!
Pages: 1 2 3 4 5