The Voynich Ninja

Full Version: Vord paradigm tool
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
This is the template I am using for exploring vords and the Voynich text. It is based on Thomas Coon's Voynich Vord Verifier, compared, contrasted, expanded and amended from various other paradigms available. It is a study tool, not an attempt to replicate the exact procedure by which the text was generated. It identifies some persistent patterns and allows the study of vords that do and do not conform. Non-conforming vords - and the ways they deviate from the pattern - are the most interesting. 



The basis of the model is that the default vord (QOKEEDY) is tripartite with each of its three parts consisting of a simple consonant-vowel structure. Thus: qo - kee - dy. 

In a default vord there are three parts, prefix, stem and suffix, here marked as compartments A and B and C.  A combination consonant-vowel (CV) is made from the available glyphs in each compartment.

However, in most cases vords are consonant final, so there is an extra class of consonant final glyphs in compartment C. These typically require a word-break after them.

There is also a class of glyphs in compartment A that allow vowel/consonant prefixes. 

We can speak of the first consonant, second consonant, third consonant and final consonant. And the first, second and third vowels, with [y] being a final vowel in this model.

Bench gallows (KTPF) are not shown but can intrude into any of the benched glyphs (in red).

Vords can be made from the compartments A+B+C, A+B, B+C or A+C, or sometimes just one compartment, most often compartment C. A surprising number of vords can be made just from compartment C, daiin for instance.

Often, vords could be made to comply to the template in several ways. There is then the question as to which of the possibilities is most consistent and viable. 

In many cases, non-conforming vords only deviate from the paradigm in a single compartment or in a single glyph, sometimes a single stroke. Abberations are few. 

The objective, though, is not to try to match as many vords as possible. The model works well enough. It is a remarkable fact that it works at all. It is especially useful to observe the behaviour of non-conforming vords and to see what has happened to make them deviate from the flow. 

Here is a star label from pg 68r: DOARO


We see it conforms and is parsed: do - a - ro. It deviates in that compartment B - the middle stem - lacks a consonant, and there is no final consonant in compartment C, although final -o is acceptable. 

Here is a non-conforming vord, from the red-inked text on 67r: LYSHYKCHY



We can locate the problem. An additional [k] has intruded into the consonants in compartment C. Otherwise, it conforms. (It is an interesting vord with an interesting symmetry. It seems the [k] has been imported into compartment C - against the rules - in order to make the symmetry.) It is parsed in this model: ly - kshy - kchy.

Needless to say the paradigm is a work in progress. It can be improved, but it can only ever be a useful approximation.
This prefix-stem-suffix system has been proposed for several decades, if I remember correctly, but has it allowed any progress?
(29-10-2022, 05:08 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.This prefix-stem-suffix system has been proposed for several decades, if I remember correctly, but has it allowed any progress?

It is true, Ruby. Over the decades various researchers, independently of each other and from various lines of attack, have concluded that Voynich words have a tripartite structure and they have proposed three-part models. 

The model I am using is based on the default vord QOKEEDY and the simple breakdown qo- kee - dy, with the further observation that each component is the form consonant/vowel. I take that as the basic pattern of a vord, as simple as it is: CV - CV - CV. 

Has it allowed any progress? No more than anything else, perhaps. I see it as a useful tool. In the first instance, the Voynich text is just a field of glyphs and spaces. But it has patterns and weaves. A vord template, even if only approximate, helps us identify and study some important patterns and weaves. 

I am aware, though, that the structures we observe within vords may be part of larger patterns and that just the category "vord" may be a distorting limitation. 

One of the things revealed by a template like this is the great care taken in the construction of vord endings. It is tempting to think they reflect systems of grammatical mutation.
(29-10-2022, 05:08 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.This prefix-stem-suffix system has been proposed for several decades, if I remember correctly, but has it allowed any progress?

I don't believe there's just one such system, but if you take all the proposed systems together as a group, then sure, I'd say they have -- if you count the ability to predict which word structures will or won't occur, with a respectable degree of probability, as "progress."

Of course the resulting paradigms might be frustrating if they don't match a favored hypothesis.
A link to the thread about Thomas Coon's system.
You are not allowed to view links. Register or Login to view.

(29-10-2022, 12:02 AM)Hermes777 Wrote: You are not allowed to view links. Register or Login to view.In a default vord there are three parts, prefix, stem and suffix, here marked as compartments A and B and C.  A combination consonant-vowel (CV) is made from the available glyphs in each compartment.
...
Here is a star label from pg 68r: DOARO
...
We see it conforms and is parsed: do - a - ro.

The system described here is partly similar to Emma's You are not allowed to view links. Register or Login to view., though hers is clearer and likely much more effective.

Something I find particularly puzzling is that here 'ar' is regarded as a vowel, while 'a' is labelled as a vowel and 'r' as a consonant.
The "doaro" example also appears to contradict the first sentence in the quote above: if a word conforms even if the second syllable has no consonant, does this mean that all consonants are optional? Are vowels optional as well? If I understand correctly, in the case of Coon's system, everything was optional. This leads to accepting a large number of non-Voynichese words (e.g. 'aeea', 'yyy','eeeee'). The usefulness of word models depends on both the number of actual Voynichese words that are accepted and the number of non-Voynichese words that are rejected.

The occurrence of 'ch' in the third position is also weird: other models typically allow a maximum of two benches, one before and one after a gallows, if present (but I see that this is You are not allowed to view links. Register or Login to view., who also has this bizarre triple 'ch').
The most persistent and fundamental pattern is simple consonant/vowel. Ultimately, any consonant or any vowel can fill the slots. So, like Coon's model, everything is optional. Rather, the template only indicates likelihoods. It is likely that the glyph that fills the first consonant slot in compartment A, if it is filled, will be either [q] or [d]. Other consonants are possible, but unlikely. There are probabilities, not adamantine rules. 

So each vertical column in the template is really a ranking by probability. Nothing is forbidden, just unlikely or uncommon. There are preferred vowels in certain places, but we always find exceptions, so any vowel is possible in a vowel slot. The double vowel [ai] sometimes appears in the vowel slot of compartment B. 

Similarly, consonant forms are quite fluid. Consonant forms with [ch] or [sh] are possible in all three compartments (marked in red in my template.) Often benched gallows [cFh etc.] occur here. The question in each case is: how has the consonant slot been filled if at all?

A deviation from the CV pattern is in the first compartment where the VC form ol- and some variations are found. The variations seem to concern the consonants [l] and [r] as options.  Since it doesn't conform it is hard to depict consistently. I've placed the VC prefixes in the vowel column of compartment A so as to be hard against the boundary of compartment B. but in fact the prefix is VC contrary to the norm. (It would improve the model if I could find a clearer way to depict this complication.) 

The purpose of the tool isn't to verify or deny (sift) vords like the sieve of Ersatosthenes. In fact, it is best used on non-conforming cases where what we witness is the improbable and unusual placement of glyphs within the basic pattern. Because it depicts usual things, we can place the template over the text and it will underline places where unusual things are happening.

To clarify, every column can be empty. The consonant in compartment B is empty in DOARO, parsed:

Stolfi suggested a similar structure for Voynich words (see You are not allowed to view links. Register or Login to view.). See also Stolfis three layer model (see You are not allowed to view links. Register or Login to view.)

However, such models do have in common that they also generate numerous words which do not exist in the Voynich text. This happens since the number of different glyph combinations is much more restricted than these models suggest. For instance the glyph after EVA-q is most likely EVA-o (in 97.5% of the cases). And the resulting sequence EVA-"qo" is most likely followed by EVA-k (in 59% of the cases) or another gallow glyph (25 %). There are always similar restrictions in place. For instance a group of EVA-e/ee/eee is either followed by EVA-d (34%), EVA-y (27%), EVA-o (23%), etc. The glyph before EVA-i/ii/iii is in 94% of the cases EVA-a etc. (Note: The empty slot beside EVA-q and the glyph combinations used in your table, like EVA-ar/al/ol or EVA-e/ee/eee, are probably chosen too describe such restrictions.)

Therefore it is possible to improve such a model by allowing only certain glyph combinations after each glyph. This results in a tree-like-model with a new branch for every newly added glyph. Each path within this model stands for a valid Voynich word. In this model one branch starting from the node EVA-"q-o-k" would for instance describe the probability for EVA-"q-o-k-ee", whereas a different branch would stand for EVA-"qo-k-o", and a third branch would describe the probability for EVA-"qo-k-y", etc. This way this model also covers the highly restricted number of different glyph combinations.

The reason for this behavior is the network character of the Voynich text (see You are not allowed to view links. Register or Login to view.). One result of the network character is that "high frequency tokens also tend to have high numbers of similar words" [see Timm & Schinner 2019, p. 6]. This means for every frequently used word it is possible to find some similar words. 
For instance beside "daiin" (863 times) there is also "aiin" (469), "dain" (211), "saiin" (144), ...
beside "saiin" (144) there is also "aiin" (469), "sain" (68), "saiir" (6), ...
beside "aiin" (469) exists also "ain" (89), "kaiin" (65), "chaiin" (45), ...
...
beside "chol" (396) exists also "chor" (219), "cho" (68), "chal"  (48), ...
beside "cho" (68) exists also "chy" (155), ...
beside "chy" (155) exists also "chey" (311), "chdy" (150), ...
beside "chey" (311) there is also "chedy" (511), "cheey" (174), ...
beside "chedy" (511) there is also "chdy" (150), "cheedy" (95), "cheody" (89), "kedy" (44), ...
beside "kedy" (44) there is also "okedy" (118), ...
beside "okedy" (118) there is also "qokedy" (272), "otedy" (155), ...
beside "qokedy" (272) exists "qokeedy" (305), "okedy" (118), "qotedy" (91) etc.

This way the words of the Voynich text result in a single network, connecting 6796 out of 8026 words (=84.67 %). The longest path within this network has a length of 21 steps, substantiating its surprisingly high connectivity [see Timm & Schinner 2019, p. 4]. For the core network see table 1 in Timm & Schinner 2019, p. 7. Side note: The more restricted tree model would only represent a different view for the Voynich network.

BTW: I have tried to describe some core parts of the network by using a 2D-grid (see [attachment=6903]). Please note: since the network is in fact multidimensional it is possible that similar words, like "qokeey" and "qokeedy", do occur in different parts of the grid.
(29-10-2022, 12:02 AM)Hermes777 Wrote: You are not allowed to view links. Register or Login to view.Here is a non-conforming vord, from the red-inked text on 67r: LYSHYKCHY

We can locate the problem. An additional [k] has intruded into the consonants in compartment C. Otherwise, it conforms. (It is an interesting vord with an interesting symmetry. It seems the [k] has been imported into compartment C - against the rules - in order to make the symmetry.) It is parsed in this model: ly - kshy - kchy.

I am unable to find this word. I guess it might be a mis-reading of You are not allowed to view links. Register or Login to view., at the bottom of the 67r2 circle.

In this case, the presence of two gallows suggests that the anomalous word is the result of the concatenation of two ordinary words. The Zandbergen-Landini transcription has an uncertain space here and the word is transcribed as 'lkshy,kchy'. In general, 'y' has a rather marked preference to appear at the beginning or end of words: an occurrence of 'y' in the middle of a relatively long word immediately suggests that two words are being concatenated.  Indeed, 'kchy' is a common word and (though 'lkshy' never occurs) 'lkchy' appears in the top line of f113r.

I would argue that also the hypothetical "lyshykchy" looks more like the concatenation of two words rather than the result of the insertion of 'k': by removing 'k' you get "lyshychy" and this is also anomalous, since the sequence 'shych' never appears in the manuscript and 'BENCHyBENCH' only occurs twice as 'shysho' and 'shyshol' (a single token each).

Stolfi has a great discussion of the You are not allowed to view links. Register or Login to view. detected by his grammar: the most numerous category is "Multiple: words that do not have a properly nested layer structure, and seem to be two more normal words joined together (716 tokens, 55% of the abnormal words)". Of course, he assigns 'lkshykchy' to this category.
This is both an interesting and a complicated subject.

In my personal opinion, the original description by Stolfi of the 3-part word structure with prefix-stem-postfix was the single major advance in our understanding of the text, even though this did not get us any closer to the solution. It explained, however, why essentially all previous translation attempts have failed, and it narrows down the search direction for the solution.

Whether the advanced grammer of Stolfi, or the various slot representations, or the network set up by Torsten is the preferable representation is for me a matter of taste. They all have their shortcomings. There are two more that one can imagine, namely the tree diagram from the start of the word (or backwards from the end of the word), or a character transition diagram. Both are mentioned in Torsten's post.

The tree diagram may be used as an example to show the main problem.
If one creates it, then it will quickly diverge in quite a wide tree.
Due to the fact that each of the three parts of the words have their own structure, what will happen is that the stem part will be replicated many times in this tree, under different word starts.

Wile the network has the advantage that it includes only valid words (i.e. only the valid words that we know), it will have a similar issue that certain frequent patterns can appear several times in the network.

I do like the slot system, but it still requires a lot of tuning before it can really help to understand the word structure.
(29-10-2022, 10:21 PM)Hermes777 Wrote: You are not allowed to view links. Register or Login to view.The consonant in compartment B is empty in DOARO, parsed:
I don't quite understand this mechanical way of dividing words into three parts: don't known languages have words without prefixes?
Pages: 1 2 3 4 5 6 7 8 9 10