(09-05-2024, 12:35 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.That's a really interesting question. I suspect different methods would yield different answers, but probably still worthwhile to try. The method I outlined above would give us one way to identify the "most probable" sequence that doesn't actually occur (not necessarily the best way, but a way). Another promising source of likely valid but unattested words is Torsten Timm's paper at You are not allowed to view links. Register or Login to view. starting at page 66 -- thinking of all the words marked with (---): [doir], [daiiral], etc. I gather he'd classify all of these as "likely," although I'm not sure he'd have a method for ranking any one of them as "the most likely."
The idea of this grid was the observation that it is possible to generate word types, which exist in the VMS, by using an existing word type and replacing similar shaped glyphs. For instance it is possible to use daiin and replace EVA-iin with EVA-in or EVA-iiin to generate dain and daiiin. The table demonstrates that it is indeed possible to describe the relation between word types. For instance, if it is known that chedy is frequent, it is possible to predict that the word shedy is also frequently used although less frequently than chedy. The general principle for the VMS-text is: high-frequency tokens also tend to have high numbers of similar word types, whereas isolated words (i.e. without any other word with edit distance = 1) usually appear just once in the entire VMS [see Timm & Schinner, 2019, p. 6]. Therefore you can count the number of similar word types (word types with ED = 1) to rank the words.
The reason behind this observation is the existence of a deep correlation between frequency, similarity, and spatial vicinity of tokens: "A useful method to analyze the similarity relations between words of a VMS (sub-)section is their representation as nodes in a graph. Starting with the most frequent token one can recursively search for other words differing by just a single glyph, and connect these new nodes with an edge" [Timm & Schinner 2019, p. 4]. The resulting network for the entire VMS is connecting 6896 out of 8026 word types (=84.67 %). The longest path within this network has a length of 21 steps, substantiating its surprisingly hight connectivity [Timm & Schinner 2019, p. 4].
The existence of a single network of similar word types not only allows to describe the relations between words in the VMS, but also enables us to describe the relationship between different sections: "It seems that the existence of a single network for all word types in the VMS would contradict Currier’s observation that it is possible to clearly distinguish between two different languages, A and B." ... It is possible to distinguish Currier A and B based on frequency counts of tokens containing the sequence <ed>. ... if <chedy> is used more frequently, this also increases the frequency of similar words, like <shedy> or <qokeedy> .... At the same time, also words using the prefix <qok-> are becoming more and more frequent, whereas words typical for Currier A like <chol> and <chor> vanish gradually. Now, reordering the sections with respect to the frequency of token <chedy> replaces the seemingly irregular mixture of two separate languages by the gradual evolution of a single system from 'state A' to 'state B'" [Timm & Schinner 2019, p. 6].
It is not a secret that our research results didn't get much attention. For instance René Zandbergen response to our paper was to publish a paper with the title "No news about the Voynich manuscript" in which he only wrote "(Examples of people who are doing very different things are Rugg (2004) and Timm and Schinner (2019))" [You are not allowed to view links.
Register or
Login to view.]. Even on his latest website about a related topic our research is not even mentioned [see You are not allowed to view links.
Register or
Login to view.]. I wish to emphasize here that I do not criticize the fact that researchers like René Zandbergen, or Claire Bowern obviously see our work as completely irrelevant. No researcher is above the possibility of making fundamental mistakes, and/or is over-interpreting their results; thus we will always welcome any serious critical discussion of our viewpoint. However, I do criticize that our research is rejected without even discussing it.