31-10-2022, 02:50 PM
(31-10-2022, 04:35 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Whether the advanced grammer of Stolfi, or the various slot representations, or the network set up by Torsten is the preferable representation is for me a matter of taste. They all have their shortcomings. There are two more that one can imagine, namely the tree diagram from the start of the word (or backwards from the end of the word), or a character transition diagram. Both are mentioned in Torsten's post.
The question how good each model predicts the occurrence of words in the Voynich text is not a matter of taste. For instance the table in the first You are not allowed to view links. Register or Login to view. of this thread suggests that you can combine any element in the V3 column with any element in the CF column. However this is not possible. EVA-e in V3 is only combinable with EVA-y in the CF-column whereas EVA-a and EVA-o did not combine with EVA-y. On the other hand certain combinations are far more likely than others. For instance is EVA-a before EVA-in/iin/iiin is far more likely than EVA-o (94% vs. 5%). Also EVA-n alone is very uncommon since EVA-n occurs in 98% of the cases after EVA-i. Therefore a plain table or slot machine is obviously not able to reproduce what we see in the VMS since it does not allow to exclude certain glyph combinations.
It is therefore an improvement if we use a model that takes only existing glyph combinations into account. This is what Stolfi tried to achieve with his word grammar. In his grammar he defined different branches for each stage. For instance there are three different rules for the "Final"-stage: "Y" + "A.M" + "A.IN" (see You are not allowed to view links. Register or Login to view.). This way Stolfis grammar describes some kind of tree since for each stage multiple rules/branches exists.
However Stolfis grammar is not able to explain why <aiiin> is far more common than <oiin>. Stolfi wrote about this question: "It would be nice if the predicted word frequencies matched the frequencies observed in the Voynich manuscript. Unfortunately this is not quite the case, at least for the highly condensed grammar given here" (see Stolfi You are not allowed to view links. Register or Login to view.). It is therefore possible to improve the model further by adding also some likelihoods for each rule. To do so Stolfi added the number of words for each grammar rule: "The primary purpose of the COUNT and FREQ fields is to express the relative 'normalness' of each word pattern. We think that, at the present state of knowledge, this kind of statistical information is essential in any useful word paradigm" [Stolfi You are not allowed to view links. Register or Login to view.].
By using the network approach it is also possible to describe the relations between words for for the VMS. The model works for any part of the manuscript as well as for the whole manuscript, e.g., that if <chedy> is more frequently used, this also increases the frequency of similar words, like <shedy> or <qokeedy>. The advantage is that the model covers the complete list of words for the VMS and is able to explain the word frequencies as well as the preference for certain glyph combinations. The model also explains some of the statistics for the Voynich text like the almost mathematically exact binomial-like word length distribution for word types as well as for word tokens. The explanation is that high-frequency tokens also tend to have high number of similar words and that similar words have similar length.
(31-10-2022, 04:35 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The tree diagram may be used as an example to show the main problem.
If one creates it, then it will quickly diverge in quite a wide tree.
Due to the fact that each of the three parts of the words have their own structure, what will happen is that the stem part will be replicated many times in this tree, under different word starts.
Since "replicated" parts belong together there is no replication, e.g. the <aiin>-part in <qokaiin>, <okaiin>, <kaiin>, <daiin>, <chaiin> and <aiin> is always the same root. With other words there is only one tree with one "root" and multiple branches. Starting point for this approach are the most frequently used words, e.g. for the whole manuscript the most frequently used words <daiin>, <ol>, and <chedy>. The question is only if also the leaves are included or if some threshold is used.
(31-10-2022, 04:35 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I do like the slot system, but it still requires a lot of tuning before it can really help to understand the word structure.
Unfortunately the slot approach doesn't work since Voynich characters did depend on each other. You write yourself "The picture for the Voynich MS text is much sparser. There are far fewer valid character combinations" [Zandbergen 2018 You are not allowed to view links. Register or Login to view.].