(27-12-2016, 03:47 PM)ThomasCoon Wrote: You are not allowed to view links. Register or Login to view. (27-12-2016, 03:31 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.I've been compiling some statistics on unique vords. Depending on which transcription system you use (I use my own), there are about 30 unique vords on the first page, but the following plant pages tend to have about 6 to 12 unique vords, depending on how much text is on that page. If it's a long page (e.g., three paragraphs), there might be as many as 20, but the proportion is fairly consistent.
That is highly interesting - I wonder why that is. Do you think it might be a clue into the workings of the encoding mechanism, JKP?
Thomas, there are a number of possible interpretations, but it seems to me that the two most likely reasons are:
1. That it's a clue to the encoding mechanism, or
2. that it's a reflection of the content.
So, I started looking at how unique Vords were distributed and their positions relative to the rest of the text (not just on the same page, but in the manuscript overall) in the hopes that this might offer some answers.
It's not too hard to describe page position, but when you factor in their breakdown components and their relationships to the rest of the manuscript, it becomes a mountain of a project, which is why I haven't finished writing it all up yet.
Without making this post wayyyy too long, I can share some general observations. To keep it short, I'll restrict it to Vords that have not been broken into their components...
So here's a JKP nutshell version of the behavior of the unique VMS word-tokens (Vords), with emphasis on the large-plants section to keep it short enough to fit in a forum post...
As an example, in the large-plants section, the following patterns are evident:
1. Unique Vords have a somewhat regular distribution within the text. They do not tend to fall next to each other.
2. Unique Vords appear more frequently, but not always, at the beginnings of paragraphs. <-- [See point 8.]
3. Unique Vords less commonly fall at the ends of paragraphs than at the beginning (but it does happen).
4. When unique Vords fall at the ends of lines, they are often suffixed by those peculiar constructions that are more frequent at the ends of lines, such as EVA-aj, -oj, or j or EVA-d with straight leg rather than a full figure-8 curve, or those common at the ends of words (e.g., EVA-y, -dy or -ar). <-- [Points worth noting since these are general patterns of the text and apparently not restricted to unique Vords.]
5. Unique Vords are somewhat of the same length as common Vords. In the VMS, vord length is not necessarily an indication of rarity or uniqueness. Sometimes unique vords are short and sometimes common words are long.
6. Most of the time, unique Vords tend to show up about 4 to 9 times per paragraph and are somewhat evenly distributed in the sense of being proportional to the rest of the text. Very short paragraphs will sometimes only have a couple of unique Vords.
7. Unique Vords at the beginnings of paragraphs are often prefaced by gallows characters.
8. Important: unique Vords will often break down into two components that show up elsewhere in the text. These atomic units combine in more than one way (some appear to behave like words, some appear to behave like affixes, frequently suffixes). Some are common vords. <-- [This is also worth noting because it begs the question, "Are they compound words as in natural language, are they structural units as in a synthetic language, or are the spaces contrived?".]
9. When the unique Vord is at the beginning of a paragraph and prefaced by gallows characters, removing the gallows will often result in a common component or a combination of two common components.
Some things I have noticed while studying medieval herbals, compared to the patterns in the distribution of unique Vords...
1. In medieval herbals, there are frequently lists of plant names in a variety of languages. If the unique Vords in the VMS big-plants section were lists of plant names, and if the VMS text followed conventional patterns, then one would expect a higher proportion of unique Vords and they would probably be closer together, rather than being more evenly distributed. Also, unique Vords are not always at the beginning of paragraphs so, even if one name were listed rather than several, either the name consists of common words (e.g., in English, the components water and lily might show up as a plant name, but also as separate words in other sections) or the text in each section does not follow a rigid model or... the text may have nothing to do with plants (or be meaningless). For the record, ancient and medieval plant names tended to be based on unique words rather than compound words (e.g., afodille, androsaema, corcodrillo, pitythalmos, etc.).
2. As examples of pages that are somewhat (although not greatly) different, Plant 4v and 17r have more end-of-line unique Vords, and 20v has fewer unique vords.
3. Following up Point 9, it's important to consider that the preponderance of unique Vords at the beginning of paragraphs might be an artifact. If it turns out that gallows characters are capitula, markers, or modifiers, and are evaluated separately from the following glyphs, then the following glyphs are often not unique. For example, if you have EVA-P
xxxxx at the beginning of the paragraph and you remove the P, the rest of the vord is often found elsewhere. This lends support to the possibility that gallows-P behaves differently from other glyphs and also that the Vords at the beginnings of paragraphs are not necessarily unique.
I have to run and that's more than enough for one post.