The Voynich Ninja

Pages: 1 2 3

A few samples of vertical patterns within the starting characters of the text (with an example attached):
You are not allowed to view links. Register or Login to view.

(20-09-2025, 11:36 PM)SherriMM Wrote: You are not allowed to view links. Register or Login to view.A few samples of vertical patterns within the starting characters of the text (with an example attached):
You are not allowed to view links. Register or Login to view.

These look interesting, but given the somewhat reduced character set of the line starters, it's expected to have a lot of purely coincidental repetitions, and it's hard to tell whether these are intentional or not. Maybe comparing the number of o-q-ch with the number of q-o-ch or ch-q-o can help.

With RF1b-er, 5383 lines [excluding folio descriptors], across folios, eva-c only [not checked if its c-h, or whatever] : (count,group) => ( 12, oqc), (4, qoc), (4, cqo)
Some high counts with q are: (42, qsq) , (28, oqo) , (25,qdq) // biggest with q is 8-line-span (2, sdqsqsdq). Folio locations <f84r.33,+P0>, <f111r.26,+P0>

Good spot! For me it is not a coincidence. Notice that it happens mostly at neighbour pages.

It is also for me an another argument in support of Thorsten Timm theory that Voynich is gibberish made by "self-citation" method. So the author thinks up some words ad hoc but more often he looks at the already written text and copies it with alteration, changing some signs in words but not all.

And He could copy not only from the previous paragraph but also from the previous page and it would be very natural as that page was lying in from of him at the table Smile

(21-09-2025, 01:11 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Good spot! For me it is not a coincidence. Notice that it happens mostly at neighbour pages.

I dunno, man, it's tricky.

This would make a good project for those University of Adelaide guys.

If we take all the first letters of all the lines and join them together into one long string, then,
According to this post at math.stackexchange[1], that i do not really understand but it seems that a repeated 8 character string is unlikely in a string of ~5383 characters.

Doing some empirical testing on a random string of 5383 chars consisting of 19 letters, needed 2-3 thousand runs to be fairly certain to find a length 8 string, so that roughly matches up.

I cannot be sure, but i get the feeling that the pattern of occurrences seems about right but the counts seem too high.
In the random tests, lengths of 4,5 were common, then a big drop for 6s, and another big drop for 7s.
The vms seems to shows the same pattern but there is just more of them, though that is kind of expected because of the positional rigidity of the characters.

If one did a study, then all that kind of stuff would have to be taken into account and compensated(?) for, but it would be interesting to see it done.

[1] "Probability of text in a string and probability that the text is random if I keep seeing the text."
You are not allowed to view links. Register or Login to view.
Edit: There is a super long string of 'o's on f70r1, i chose to ignore it.

Maybe for an easier analysis, the character set (and the line-start words considered) should be limited to the most common ones? And then see how likely you are to get repeated patterns?

I added a follow-up post, You are not allowed to view links. Register or Login to view. after analyzing 174 pages of the text heavy folios (ignoring the zodiac section and a few others). Of these pages, I found vertical patterns in the first characters of 5 string length or more in 88 pages, or a little more than 50% of the text (with considerable overlap between strings). This includes 22 sets of six-string length, but more notably - two different pairs of seven string length.

Using the Poisson distribution, the chance of this occurring is 0.0067% - meaning the text in the manuscript is not random.

However, "not random" doesn't really tell us much - other than it could be a language, a cipher, or like Rafal suggested above - a scribe that liked to copy from previous pages.

I will add one thing to consider.
You are analysing first letters in the lines and noticed that they make a repeating pattern.

Sometime people specially arrange their text so first letters in the lines would make something - usually some meaningful word or phrase.
It is called acrostic: You are not allowed to view links. Register or Login to view.

We cannot exclude it in the case of VM.
And unfortunately we cannot exclude a lot of things Wink

(20-09-2025, 11:36 PM)SherriMM Wrote: You are not allowed to view links. Register or Login to view.A few samples of vertical patterns within the starting characters of the text (with an example attached): You are not allowed to view links. Register or Login to view.

The blogpost cited by Rafal is too simplistic. The first letter of each line is not a random letter of the Voynichese alphabet with equal probability. The line-initial letters have a highly skewed distribution. That greatly increases the probability of certain three-letter combinations.

(The skewed probabilities themselves are a feature that requires explanation. I posted some possible explanations somewhere on this forum, a few days ago.)

let P(x) be the relative frequency of EVA letter x in line-initial position. Namely, if 35% of the lines start with y, then P(y) = 0.35, and so on.

If you have a string S of n letters randomly and independently generated with those probabilities, and you pick any index k in 0 to n-3, the probability that the substring S[k]S[k+1]S[k+2] is qoy is Z = P(q)P(o)P(y). In that string, you should expect to find about (n-2)Z such strings. (A bit less because the strings cannot overlap, but that should be close enough).

Can you compute the expected number of qoy substrings with this method?

Note that even if the letters are drawn independently, some three-letter combinations will show up more often than expected by the above formulas. A three-letter combination would have to be a lot more frequent than predicted in order to be "notable".

All the best, --jorge

Jorge - I updated my post to focus on strings of five or greater length, as the three length repetition was too frequent. I believe the two different pairs of 7 character length is notable. Also my statistics only include the 18 line-initial characters, not any amount of random letters. Of the 18 characters, should I compute based on frequency?

New blog post - You are not allowed to view links. Register or Login to view.

(29-09-2025, 09:34 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(20-09-2025, 11:36 PM)SherriMM Wrote: You are not allowed to view links. Register or Login to view.A few samples of vertical patterns within the starting characters of the text (with an example attached): You are not allowed to view links. Register or Login to view.

The blogpost cited by Rafal is too simplistic. The first letter of each line is not a random letter of the Voynichese alphabet with equal probability. The line-initial letters have a highly skewed distribution. That greatly increases the probability of certain three-letter combinations.

(The skewed probabilities themselves are a feature that requires explanation. I posted some possible explanations somewhere on this forum, a few days ago.)

let P(x) be the relative frequency of EVA letter x in line-initial position. Namely, if 35% of the lines start with y, then P(y) = 0.35, and so on.

If you have a string S of n letters randomly and independently generated with those probabilities, and you pick any index k in 0 to n-3, the probability that the substring S[k]S[k+1]S[k+2] is qoy is Z = P(q)P(o)P(y). In that string, you should expect to find about (n-2)Z such strings. (A bit less because the strings cannot overlap, but that should be close enough).

Can you compute the expected number of qoy substrings with this method?

Note that even if the letters are drawn independently, some three-letter combinations will show up more often than expected by the above formulas. A three-letter combination would have to be a lot more frequent than predicted in order to be "notable".

All the best, --jorge

Pages: 1 2 3

SherriMM

oshfdk

RobGea

Rafal

RobGea

Koen G

SherriMM

Rafal

Jorge_Stolfi

SherriMM