The Voynich Ninja - Can we say the VMS is a "free word order"?

Pages: 1 2 3

(01-10-2020, 09:07 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.
(01-10-2020, 08:53 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Or, more practically, "if labels are nouns then ... and ... ."

This sounds like an interesting experiment, has it been done?

And can it be done? There are a few issues beforehand. For example:
* multi-word labels are more likely to contain at least one non-noun.
* what about "labelese"? It is not something I have studied myself, and I've heard everything from "it doesn't exist" to "labels have their own language".
* to determine word order we need to be able to determine phrases, right? How many phrases per line of Voynichese? Do phrases cross lines?

Sounds very tricky...

Very tricky and I doubt it would work, but it's the best answer I can think of.

There has been some work on this before with creating word classes. I think the next step would be generating word class sequence statistics. So like this:

Classify words according to classes (minimise adjacency within a set number of classes, let's say five), which has been done before.
Assign each class a number and generate a new text where each word is replaced by the number of its class (or null if unclassified, which many would be)
Determine if any sequences are more common than expected. So does "1 4" occur more often than "3 2", and so on.
In a way it's a kind of entropy measurement. How random is word class distribution and does word class in one position predict the next?

Essentially our first step would be to determine if word classes have any reality in how the text works. It could then be taken further by inputting guesses about word classes (labels are just one example, word frequency would be another possible way of guessing as a small class of words which was mostly very common would likely be a "function" rather than a "content" category).

For example, imagine if we analysed an English text and saw that "the" and "cat" were in different word classes. We would then find that "the cat" (or determiner noun) phrases are more common than "cat the" (noun determiner) phrases. This case would be clear cut, though I'm sure natural languages could be highly ambiguous.

Pages: 1 2 3