The Voynich Ninja - Various Graphs and Analyses

Pages: 1 2

I've had two stints of Voynich fever, once in 2015 and the second time a year ago. On the second round I produced some graphs and analyses and thought about posting them here, but never got to it. It keeps bugging me, so I'd better post this stuff to get some peace of mind...

Everything is very unpolished, not nice enough to write blog posts about, but maybe this could inspire someone. Most of these analyses examine a different aspect, but there isn't really enough substance to merit a new thread for each. Mistakes are to be expected. I've mostly done the analyses with Python in JupyterLab (Pandas, MatPlotLib etc), using various transcriptions, mostly Takahashi for older stuff and ZL. I'm usually mostly interested in paragraph type text and omit labelese.

I don't claim anything here is new. Sometimes I went out to replicate old results, and I'm well aware of the fact that in Voynich research 99.9% of all results have been thought of by a dozen people before.

Here's some keylike sequences. I've tried to find correspondences between lists of "numbers" in margins, seeking something that could help assign numerical values for them. The blue, green, and red numbers are attempts at assigning values to the symbols. It would be nice to find 1,2,3,4,... and maybe some astronomically significant sequences like 356 or 12. Nothing really matches very well, as usual. There are, however, some sequences that pop out.

[attachment=12741]

The file is here for anyone who'd like to play with it.

[attachment=12742]

LAAFU PAAFU time. I was interested in how different parts of the paragraphs behave, thinking that comparing vord frequencies them might reveal correspondances. Assuming the vord distributions in these different paragraph positions are about the same (yes, I realize the top row probably contains different type of language that the rest), could it reveal some systematical transformation that vords go through at different positions? So did it? You be the judge.

Top left = first vord of first line, top right = last word of first line, top = the rest of first line etc...

(The transcription here is the one used by daiin.net, hence "csh" meaning EVA "sh".)

[attachment=12743]

[attachment=12744]

[attachment=12745]

Here are some frequent vord pairs in Currier B. Only instances where a pair appears 3 or more times are counted. The number tells how many times more you find this pair compared with the expected, for example dar + shey 2.53 times more than expected. Single lit up cells don't mean much, but when you have a : : like formation, it might mean something. I moved the rows and columns so that these patterns got clumped together, and some patterns emerged. -ain/-aiin -vords are followed by she/che+y/dy, -ey/edy vords by qok- vords, but not all types etc. Moving the columns/rows by hand is pretty arbitrary, I thought about grouping them automatically but didn't get to it.

[attachment=12746]

Enough for today.

Would you be able to give a matrix of counts for each pair? If pair counts are low then the ratios of observed to expected might not be meaningful.

(30-11-2025, 07:00 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.Single lit up cells don't mean much, but when you have a : : like formation, it might mean something.

Maybe, but the cases when there is no ": : formation" are more interesting -- because that is what happens all the time in natural language texts.

I suggest that you compute the same statistics on a text in English (or, better yet, on a known language that you don't know) and see what conclusions you can draw about the language from them.

All the best, --stolfi

(01-12-2025, 10:05 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.Would you be able to give a matrix of counts for each pair? If pair counts are low then the ratios of observed to expected might not be meaningful.

The bare unarranged pair table looks like this. Looks like I set the minimum of 3 pairs for the first table. The counts are very low, yes. (Btw the way of counting pairs is very naive: it considers the last word of a paragraph and the first of the next a pair. As I said, very unrefined.)

[attachment=12765]

(01-12-2025, 02:26 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I suggest that you compute the same statistics on a text in English (or, better yet, on a known language that you don't know) and see what conclusions you can draw about the language from them.

I had done some comparisons with natural languages. The realization that certain pairs of small groups of common words are very frequent probably was the inspiration for this line of investigation. Like in German you'll very frequently have in/an/von/zu + der/die/dem. Similarly in other European languages that have prepositions and articles.

[attachment=12766]

I don't remember how this table was made, but it shows the :: . (What's a sensible way to describe this... Cartesian product-like behaviour?)

[attachment=12767]

(01-12-2025, 06:57 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.I had done some comparisons with natural languages. The realization that certain pairs of small groups of common words are very frequent probably was the inspiration for this line of investigation. Like in German you'll very frequently have in/an/von/zu + der/die/dem. Similarly in other European languages that have prepositions and articles.

Thanks for the tables, but I don't see how to read them. Are there ": : patterns" in them?

Quote:(What's a sensible way to describe this... Cartesian product-like behaviour?

I don't know of a good name either, but in linear algebra it would be described as a "2x2 submatrix that has low determinant", or "is nearly singular", or "has a small eigenvalue".

The determinant of a 2x2 matrix M = [[a,b],[c,d]] is D = ad-bc. When D is zero we say that M is singular.  This happens if and only if one row is a multiple of the other. Or, equivalently, if and only if one column is a multiple of the other. Or, equivalently, if and only if there are numbers R,S and X,Y such that M = [[RX,RY],[SX,SY]].

The eigenvalues of the 2x2 matrix M are the numbers L1 = T/2 + sqrt(T^2/4 - D) and L2 = T/2 - sqrt(T^2/4 - D), where T is the "trace" of the matrix, T = a + d. These may be complex numbers. However, for our purposes the order of rows and columns does not matter, so if the determinant D is positive you can swap the rows (or columns) of M before using those formulas. Then D will be negative, the thing inside the sqrt() will be positive, and L1, L2 will be real numbers.

Discard the signs of L1 and L2. The ratio R between the smallest and the largest of these numbers is a measure of how close to singular the matrix M is.

Let the two words on the left be A1 and A2, and the two on the right be B1 and B2. The maximum value  of R is 1, which occurs if the matrix of frequencies is [[a,0],[0,d]] or [[0,b],[c,0]]; that is, if A1 always pairs with B1 and A2 with B2, or vice-versa.

The minimum value of R is 0, which occurs if M is singular, namely fits the ": : pattern". This means that the choice between B1 or B2 after A1 or A2 does not depend on which of these was the previous word. Or, equivalently, that the choice between A1 and A2 before a B1 or B2 does not depend on which of these is the next word.

In natural languages these ": : patterns" seem rare. Even when the German grammar seems to allow "in" or "vor" pair indifferently with "die" or "der", in any particular text you will find that "in" has a definite tendency to partner with "die" and "vor" with "der". Or vice-versa.

And you must be aware that many languages (including Classical Latin) make little or no use of prepositions and articles, and use word order or declensions instead. Articles and prepositions as separate words are a characteristic feature of Romance and Germanic languages.

And you also must try to account for the uncertainty in the frequencies of words and word paits that comes from sampling error. I should know the formulas for that, but can't remember them now. But the point is that if the matrix of occurrence counts for {A1,A2} x {B1,B2} looks like [[2,1],[2,1]], one cannot count that as an occurrence of the ": : pattern", because those numbers are mostly sampling error noise.

All the best, --stolfi

(01-12-2025, 06:43 PM)srjskam Wrote: You are not allowed to view links. Register or Login to view.The bare unarranged pair table looks like this

How do you get 41 for or aiin ? I get 52 in language B.

[attachment=12781]

Pages: 1 2