The Voynich Ninja

Full Version: An attempt at extracting grammar from vord order statistics.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9
(03-06-2025, 01:20 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.Cry its bad news. It looks like voynich is more like random than like genesis.

I'm not sure it's bad news. Bad news would be if VMS was much less random, which would imply there is little information there, be it a cipher or not. I would prefer for the manuscript to have some meaningful text, so I'm fine with it being more random  Smile
(03-06-2025, 01:20 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.bad news. It looks like voynich is more like random

Not necessarily random. A better way to express it might be to say that the manuscript seems to show signs of being an artificial construction. Torsten Timm has a lot of interesting ideas about this in his paper "How the Voynich Manuscript was created". Go to You are not allowed to view links. Register or Login to view. for a copy of this. I would very much recommend it, if you haven't already read this. It does suggest the possibility of there not being any linguistic grammar.

The paper "Seven Habits of Highly Eccentric Paragraphs" by Tavi Stafford, given at the International Conference on the Voynich Manuscript 2022, University of Malta, is also a nice one.  ( You are not allowed to view links. Register or Login to view.. )  It highlights some of the oddities in the manuscript that any 'translation' of the manuscript will have to explain.
(03-06-2025, 01:32 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(03-06-2025, 01:20 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.Cry its bad news. It looks like voynich is more like random than like genesis.

I'm not sure it's bad news. Bad news would be if VMS was much less random, which would imply there is little information there, be it a cipher or not. I would prefer for the manuscript to have some meaningful text, so I'm fine with it being more random  Smile

It doesnt say anything definitively about meaning. What it says is that there isnt as strong a word order grammar in voynechese like we have in english, german or other germanic languages. If there is grammar it is probably more in the case system and less in the word order. (or my grammar extraction algorithm needs improvement)
(03-06-2025, 02:01 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.It doesnt say anything definitively about meaning. What it says is that there isnt as strong a word order grammar in voynechese like we have in english, german or other germanic languages. If there is grammar it is probably more in the case system and less in the word order. (or my grammar extraction algorithm needs improvement)

I'm not sure I understand whether you are talking about a grammar in the computational sense (something like "the preference of certain sequences to follow certain other sequences depending on fixed rules") or the grammar in the linguistic sense. They are related, but I think the linguistic grammar is generally governed by the semantics, which can be quite complex, and might not be easily detectable from the word order alone.

Also, I'm not sure the Bible is a good choice of the source text for this task, I suppose it has way too many repeating patterns and similar phrases with personal pronouns, compared to, say, a treatise on astrology. Since it doesn't matter if we take an old text or a relatively new text, maybe some scientific book from the XIX (that is, in public domain) from the Gutenberg project could be used as another benchmark?
(03-06-2025, 01:20 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.I did an attempt of a score function,  

Sorry for being dense, but I am not sure I understand how this score function is defined. Could you please provide a short description and discussion of the results you get?

I agree with what oshfdk said: of course King James Genesis is much simpler than Voynichese. It is useful as a best-case test: a method that doesn't work on King James is not likely to work on Voynichese. But more complex texts (as I mentioned You are not allowed to view links. Register or Login to view.) would provide better parallels. Also, it's important that the number of words in each experiment is the same, or at least not too different (fewer words inevitably mean weaker results).
(03-06-2025, 05:00 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
(03-06-2025, 01:20 PM)davidd Wrote: You are not allowed to view links. Register or Login to view.I did an attempt of a score function,  

Sorry for being dense, but I am not sure I understand how this score function is defined. Could you please provide a short description and discussion of the results you get?

I agree with what oshfdk said: of course King James Genesis is much simpler than Voynichese. It is useful as a best-case test: a method that doesn't work on King James is not likely to work on Voynichese. But more complex texts (as I mentioned You are not allowed to view links. Register or Login to view.) would provide better parallels. Also, it's important that the number of words in each experiment is the same, or at least not too different (fewer words inevitably mean weaker results).

[Image: Screenshot_2025-06-03_21-11-36.png]
so according to wikipedia the binomial distribution has a std dev of sqrt(p*(1-p)*trials), where trials in this situation is the number of words in the group, p is the relative size of the group "going to". 
In my calculation this is called "d". 
the software calculates the average of "d" of each group transition, but also d*d average.
As I understand the measure how good the groups are appears at the end of the data and is called  quality scores avg(abs(d))
So Genesis is 20, random order Genesis is 2 and Voynich is 10 which something in between. Curious  Wink

Davidd, could you try more texts?
Especially it would be interesting for me to see how some language different from English behaves. Is mean something with loose word order and advanced declension.

Maybe you could try Genesis in Latin, taken from Vulgate? Or some Slavic language where word order is less more rigid?
I could deliver the texts if it's a problem.
My knowledge of statistics is what it is, so I couldn’t fully understand the code screenshot at You are not allowed to view links. Register or Login to view. and the 3 lines of text at post #67.

I understand that class transitions are compared against the standard deviation of a random scenario (words randomly scrambled or random-word classes, I am not sure?). The distance “d” from this random scenario is expressed as a multiple of std dev and we would like its absolute value to be high: it’s OK both to have class transitions that are clearly avoided, resulting in low negative values (e.g. “the of”, for Genesis) and class transitions that are clearly preferred, with high positive values (“the father”).

A major difference of Voynichese (e.g. QuireM=Q13) with respect to the Genesis is that for English the diagonal has consistently negative values (words of the same class rarely appear consecutively), while for Voynichese there is no clear pattern: there are negative values with classes that avoid occurring consecutively (e.g. “shedy” and “ol”), positive values for classes that tend to repeat (“qokedy” and “okedy”) and three classes with values close to 0, i.e. behaving like the random case.

[attachment=10784]

I guess this can be described as “autocorrelation”. Gaskell and Bowern (You are not allowed to view links. Register or Login to view.) discuss something similar about the distribution of word lengths:

Quote:A notable feature of the VMS that has to our knowledge only been discussed by one other publication [20] is positive autocorrelation of word lengths. Word lengths in most meaningful texts are negatively autocorrelated: that is, long words tend to be interspersed with short words (long-short-long-short). By contrast, the VMS exhibits positive autocorrelation (long-long-short-short). Positive autocorrelation is only observed in a limited number of natural languages, but is common in gibberish (Figure 3).

The publication they mention is Matlach et al. You are not allowed to view links. Register or Login to view. (which I don’t think I ever entirely read).
It would be interesting to see if the languages that show word-length autocorrelation also show some measure of word-class autocorrelation (but I am not sure in which languages Gaskell and Bowern observed that feature).

Another observation is that, once one has a score function, it is possible to tackle the problem as optimization (e.g. by simulated annealing): this could possibly lead to better results, though it might be computationally complex and it’s possible that the improvement is only marginal.
Thank you Marco, Rafal.

I have done lots of runs since my last post, improved output  a lot 

Language A sits between 60 and 70.
Language B sits between 90 and 110
Quire T sits around 200

Genesis sits around 300

Genesis with the words shuffled sits around 20



You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
Pages: 1 2 3 4 5 6 7 8 9