RE: Vord frequency histogram as an indicator of the text category
Anton > 29-03-2016, 03:17 PM
I think I may disclose my methodology in advance (additionally I may hopefully gather some advance critique). Actually why I've been bothering about first vords of folios was that I took that as a possible disproving criterion for the assumption that first vords of botanical folios represent plant names.
Indeed, if first vords of botanical folios are not plant names but some more commonplace words, they should exhibit low degree of uniqueness and quite high frequency count. On the contrary, if they are plant names, then it would be unlikely for them to have high frequency counts, because a plant name is a specific term not suitable for each and every context, let alone that there are 100+ Voynich plants. (What I mean is the integral picture; of course, certain "important" plants may be mentioned very frequently).
I have shown that the uniqueness of the first vords of botanical folios is 66%, while for the second, third and last vords of botanical folios uniqueness is twice that lower and is in the range of 33...35%. (Of course I could investigate fourth, fifth and so on words, ideally embracing all vords in all botanical folios, but that would have been very tedious, since I did everything manually).
Thus the first vord of a botanical folio looks something specific as compared to the rest of the folio. Of course, that does not explicitly prove that it stands for a plant's name (especially in each and every case), but that comfortably does not explicitly disprove that. This result is what I like, because the subsequent research (not explained in this post) is based on the assumption that first vords are plant names indeed.
However, to be consistent, one needs to compare the behaviour of the first vords of botanical folios with the behavior of the first vords of folios of other sections. If they exhibit a comparable degree of uniqueness, then it may mean one of the two things:
a) first vords of folios of other sections are also some specific terms, relevant to that sections;
OR:
b) first vords of folios owe their uniqueness not to their meaning, but to their position in the folio.
So I performed some counts as specified above and found that the first vords of folios in balneo and recipe sections are also highly unique. (BTW, the first vords of astro folios, where there are explicit paragraphs, do not seem to be highly unique, although I did only a basic screening).
To investigate the potential option b), we need to discern between folios and paragraphs. Indeed, the first vord of a folio is, at the same time, the first vord of the first paragraph in a folio. So what position matters (if any) - to be the first vord in a paragraph, or only to be the first vord in a folio? Like mentioned above, 2nd paragraphs of recipes yield high uniqueness of their first vords. So, if position matters, then it is the position in a paragraph, not the position in a folio. But does the position matter, after all? I currently checked botanical folios up to 10v (I will check them all shortly), and found that, all paragraphs taken together, their first vords exhibit only 50% uniqueness. This means that if we exclude first paragraphs, then second etc. paragraphs would exhibit quite low degree of uniqueness of their first vords - in contrast to the first paragraphs, the first vords of which are at the same time the first vords in folios.
To sum this up, if no evidence to the contrary is discovered during the forthcoming checks (I will not make all of them, at least for now, because currently my interest is limited to the botanical section, but maybe someone else kindly will), the following picture presents itself:
1) First vords of folios of botanical, balneo and recipe sections are highly unique, while first vords of astro folios are not.
2) First vords of paragraphs of the recipe section are highly unique.
3) First vords of second, third etc. paragraphs of folios in the botanical section are not highly unique.
What does this mean? Probably this: botanical and balneo folios generally open with some specific term, like, in the former case, the name of the plant. Each paragraph (I checked only the 1st and 2nd ones, all subsequent are just my guess - to be checked by someone else) of the recipe section generally opens with some specific term.
All this will be incorporated in the forthcoming second part of my "Contextual Analysis of Voynich Objects".
If your question about the circular charts in the astro section is answered, we could make some conclusions about that also. The difficulty is that we don't know if those represent coherent phrases or rather sets of disjoint vords, and also (as you note) that we don't always know where is the beginning of the phrase.