(05-08-2024, 08:13 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I think it would be extremely valuable to the community if someone was able to write an "explain like I'm five" version of these talks, trying to focus on the bigger picture and how Emma's, tavie's and Patrick's findings relate to each other. This might be an assignment even the authors themselves struggle with, but it would be an invaluable exercise.
In my opinion, it should be noted that the talks use different reference systems to measure their results. Patrick Feaster (see You are not allowed to view links.
Register or
Login to view.) is using a repetitive default loop "qokeedyqokeedyqokeedyqokeedy" as reference system and finds variations of the default loop like "qokedy" instead of "qokeedy" interesting. Emma May Smith (see You are not allowed to view links.
Register or
Login to view.) on the other hand is using a pure random distribution (see You are not allowed to view links.
Register or
Login to view.) as reference system and finds therefore repeated elements across words like the "y.q" in "qokeedy.qokeedy" interesting. In short, they measure the same thing by focusing on different viewpoints. Patrick points out that it is possible to find some randomness within the repetitive Voynich text, whereas Emma highlights that the text is less random than rolling dice. (A simpler example would be the sequence "qy qy qy qy qe qy qy qy qy qy." One person might point out that a pattern of repeated "qy" exists, while another might highlight that the single instance of "qe" breaks the pattern and is therefore the most interesting part of the sequence or possibly a mistake.)
I prefer to compare the Voynich text with natural languages. From this perspective, I argue that the Voynich text is very different from natural languages, as it is both more repetitive and contains more variation. (see You are not allowed to view links.
Register or
Login to view.).
Side note to illustrate how important the reference system is: The paper of Montemurro et al from 2013 (You are not allowed to view links.
Register or
Login to view.) is based on the idea that "uninformative words tend to have an approximately homogeneous (Poissonian) distribution" and "the most relevant words are scattered more irregularly, and their occurrences are typically clustered". But they didn't check if words with a homogeneous (Poissonian) distribution exist. Than in 2021 a paper of Claire Bowern assumes that "Montemurro et al. (2013) use techniques from information theory to identify which words are most likely to contribute to topics in texts. That is, they identify words that are more uniformly distributed throughout the Voynich Manuscript and compare them with those that tend to cluster." (see You are not allowed to view links.
Register or
Login to view.). Bowern also didn't check for uniformly distributed words. However since uniformly distributed words doesn't exist within the Voynich text [see Timm & Schinner 2019, p. 6] Montemurro et al. and also Claire Bowern wrongly assume that it is possible to distinguish between uniformly distributed words and clustered or topic words. Based on this assumption they wrongly conclude that a relation between topics indicated by illustrations and the distribution of words exists. See also the review blog post of the linguist Chris Chrisomalis You are not allowed to view links.
Register or
Login to view.
There are different levels of variation within the Voynich text. It is well known that Currier A behaves differently than Currier B. Currier A prefers words like daiin, chol, and chor, while on pages in Currier B words like chedy, qokeedy, daiin, and qokaiin are frequent. However also between different quires in Currier A or quires in Currier B relevant differences exists. For instance words like cheol/sheol are more common in the pharmaceutical section than in Herbal A. For Currier B words containing the sequence "ed" are far more common in the biological section than in Herbal B. If we look even on pages no obvious rule can be deduced which words form the top-frequency tokens at a specific location, since a token dominating one page might be rare or missing on the next one (see Timm & Schinner, 2020). Currier also points out that even "the frequency counts of the beginnings and endings of lines are markedly different from the counts of the same characters internally" (see You are not allowed to view links.
Register or
Login to view.). It would therefore be interesting to check in future research how certain of the presented statistics behave for different sections.