The Voynich Ninja

Full Version: Identifying function words
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
Continuing with the search for function elements based on Voyncih stars plotting, for partial matches one just needs to replace "exa" with "all" in the query quoted above. Unfortunately, I can't discern not only any systematic pattern, but, I'm afraid, even a repeating one. Perhaps, Voynich stars are not appropriate for this task, and one should try with some other objects, for example with pharma section labels which may be expected to appear in the Recipes.
I've got a new idea of how to find "and" (if not in the form of a clitic). Given that "and" would be expected to be amongst most frequent words, one could check the list of top-N most frequent vords and see which of them do never occur:

a) as a label
b) at the end of paragraphs.

Vords satisfying both conditions at the same time would be "and"-suspects.

This implies that there is no shuffling, of course.
I like this avenue of investigation, to focus on finding "and", but depending on the language this may be more difficult - even if it's not a clitic. For example, in modern Turkish, "and" is expressed by ve, de, da, ile... There are also constructions with ister and ya, where those words also replace English and. And that's just after a superficial google search. This means that the relative frequency of each of these words will be lower than that of "and" in English.
That's not good. Why would one need so many different words to express the same logical operator?

But "ve" is listed as top second in the Turkish frequency list: You are not allowed to view links. Register or Login to view.

Anyway, by excluding label vords from a set of vords, we essentially narrow the scope of function-word-suspects. And if any of them does not occur as paragraph-final, that would be an immediate attention flag.
For example, the Voynich Reader software suggests that, based on Takahashi transcription, ol is the second most frequent vord, with a count of 535. Job's tool suggets that ol occurs only once as paragraph-final. Unfortunately, I can't see the way with voynichese.com to automatically find the very occurrence. Maybe it would appear just a transcription error.
But ol is something highly sequentially repetitive, ol ol is not rare, and there is one instance of ol ol ol.
(30-03-2018, 05:12 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.For example, the Voynich Reader software suggests that, based on Takahashi transcription, ol is the second most frequent vord, with a count of 535. Job's tool suggets that ol occurs only once as paragraph-final. Unfortunately, I can't see the way with voynichese.com to automatically find the very occurrence. Maybe it would appear just a transcription error.

I think you are right, Anton. The occurrence should be the one at mid-page, between shecthy and the pool:
You are not allowed to view links. Register or Login to view.
It is well possible that this is just a line break and the paragraph continues below the horizontal stream.
Next, we have aiin. It is never sequentially repeated (one repetition found by Voynich Reader is a mistake), and Job's tool states that it has 8 paragraph endings. However, at least one of them (f43r) is a mistake, the last vord is definitely chokoraiin, and not simply aiin. I'd find and verify seven other alleged occurrences as well.
One helper technique that may turn useful is that function words, as such, can be expected to be distributed more or less evenly across the corpus, unless, of course, certain parts of the corpus are comprised of special patterns, such as numbered/unnumbered lists or labeled diagrams, in which cases there is no much room for function words to introduce themselves.

The count per folio, of course, cannot be a reliable measure here, because folios contain different number of vords. Some sliding window should be introduced - say, 1000 vords.
In many languages "and" is two letters (sometimes the same two letters that are found within common words). In Latin, for example, "et" is "and" and "et" is also found within many words.

The ampersand symbol (&) was widely used by medieval scribes in many languages so it's also possible that "and" is expressed by a single glyph and the same glyph might represent letters (a ligature or a single letter).
Pages: 1 2 3 4 5 6 7 8 9