The Voynich Ninja

Pages: 1 2 3 4 5 6 7

I ran some simple experiments with the 3-table cipher idea. As always, I may have misunderstood something or made errors in the process.

I started from the paragraphs text in Q13 (Zandbergen-Landini EVA transliteration).
I split each word into three segments, basically:

the first characters of the word make up the prefix
the stem starts with either one gallows, benched-gallows or 'd'
the suffix is a final sequence made of [oeyainrlsmg]

The prefix and stem can be null (represented as _).
I sorted by frequency each of the three segments and assigned them to the English characters having the same rank. I based this on the Genesis in King James' edition.

The result is the following cipher table (sorted both by rank and alphabetically):

You are not allowed to view links. Register or Login to view.

This is very rough, including some ambiguity. For instance, the word 'ol' can be parsed both as 'o,_,l' (AEL) and '_,_,ol (EED).

I ciphered the English text by first splitting it into 3-letter groups and mapping each group to a Voynichese word make of prefix+stem+suffix.

in the beginning god created the
INT HEB EGI NNI NGG ODC REA TED THE

she,ckh,ar ch,_,es _,kched,or l,ckh,or l,kched,r che,ted,ary sh,_,ey qo,_,ol qo,t,y
Sheckhar ches kchedor lckhor lkchedr chetedary Shey qool qoty

I applied the same process to a version of the English text where word order was randomly scrambled.

These plots show the % of perfect reduplication vs MATTR 200. For comparison, I also included the text files from Brian Cham's corpus. In the plot on the right, I removed the outlier PML file.

[attachment=5490]

The encoding process has these effects:

MATTR is greatly increased. These is due to the fact that MATTR is reduced by the regularity of word sequences in grammatical text: the Genesis is particularly repetitive and it has a particularly low MATTR. The encoding process destroys word patterns, since spaces are re-assigned during the creation of 3-letter groups.
Reduplication is reduced: this is not visible for the original Genesis file (where reduplication is ~0%), but it is illustrated by the scramble file. The reason for this is analogous to the increased MATTR: reduplication also depends on words and if words are destroyed it also is affected.

Note that the two measures discussed above do not depend on the mapping table. They are purely an effect of splitting words into 3-letter groups.

BTW, this also connects to the problem of labels mentioned by Anton above: with this system, a Voynichese word is typically not enough to encode a plain-text word. But labels appear to be words and have no deep structural difference from words in paragraphs.

The next plot shows measures that depend on the specifics of the table. The X axis shows that the cipher greatly reduces character conditional entropy, making it comparable with the notoriously anomalous values found in the VMS. The Y axis shows that average word length is increased with respect to the original English and the value for the cipher is slightly greater than that for the VMS (~6 vs ~5.2).

[attachment=5491]

Finally, I checked the distribution of word lengths in the dictionary. As Stolfi observed, Voynichese words are distributed along a binomial curve. For European languages the distribution has a longer right-side tail (due to longer words): this is shown by the green squares that have higher values than the fitting curve for words longer than 8 characters. Both Quire13 and the ciphered text are quite close to their bell-curves: also in this case, one can see that the cipher text results in longer words.

[attachment=5489]

Emma pointed out to me that this cipher system could be unable to generate a sufficient number of short words. This simple experiments confirms the existence of this problem. The formulation of the table should allow to somehow adjust things, but if the empty prefix and stem must correspond to a single Latin-alphabet letter, than their frequency cannot be very high, do I am not sure that a solution exists. Similarly, I have not looked at how closely the lexicon of the cipher text matches that of Quire 13: also in this case, the table could be built in a more sophisticated way, that would likely result in a better match.

In conclusion, I confirm that I am impressed by Rene's proposal. The method is simple and it does a good job at imitating Voynichese word structure. As Rene pointed out both in the paper and in this thread, there are other features that do not appear to be possibly explained by a similar approach.

It is great to read of a cipher-oriented idea for the VMS that can be tested and analysed! This could be the first time I see such a well thought and well presented cipher hypothesis.

Many thanks Marco.
I really appreciate your efforts in trying this out.

To respond to the comment about the word length distribution, there are two "problems". The first is that, in order to obtain the Voynich binomial word length distribution, which also includes shorter words, one should have three empty entries in the left column, three in the centre column, and three length-1 items in the right column, corresponding with one of the earlier tables in the paper (Table 4?).

The next problem is, that this works if all combinations are equally likely to occur, but this will not be the case in this particular application of the method. The number of different trigrams in English is far less than 24 to the power 3. It is also (almost certainly) less than the number of word types in the MS.

The single character distribution frequency is very uneven, so the final WLD can be skewed either way easily, by making different mappings in the table.
However, I understand that you approached this in an unbiased and statistically reasonable way.

Most of these problems disappear, when the method is not used to map trigrams, but in a 'nomenclator' style. However, that is not so easy to experiment with.

If the wheels operate independently it would be easy to split the words into independent parts. This way we would be able to determine the number of wheels as well as the states for each wheel.
But in the case of Voynichese the parts depend on each other. Stolfi writes: "In Voynichese daiin, qokeedy and qokaiin are all very popular (866, 305, 266 occurrences, respectivey), while deedy is essentially nonexistent (3 occurrences). Our paradigm fails to notice this assymetry, since it allows independent choices between d- and qok-, and between -aiin and -eedy." You are not allowed to view links. Register or Login to view.

Stolfi is splitting common Voynich words like daiin or qokeedy into two parts. Tiltman also splits Voynichese into two parts. Didn't that mean that we should assume the existence of only two wheels?

(03-05-2021, 01:56 PM)lurker Wrote: You are not allowed to view links. Register or Login to view.If the wheels operate independently it would be easy to split the words into independent parts.

It wouldn't be that easy, among others because of the great uncertainty related to word spaces.
Rare combinations of the parts can correspond to rare words in the plain text.
Typically, for a text of this length, about 15% of all word tokens, and over 50% of all word types are Hapax, i.e. words that appear only once in the text. (Rough figures).

(03-05-2021, 01:56 PM)lurker Wrote: You are not allowed to view links. Register or Login to view.Stolfi is splitting common Voynich words like daiin or qokeedy into two parts. Tiltman also splits Voynichese into two parts. Didn't that mean that we should assume the existence of only two wheels?

As described in the paper, Tiltman's table generates only 240 different word types, out of ca. 9500 needed.

My point was that we do not observe independent choices.

(04-05-2021, 05:38 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Rare combinations of the parts can correspond to rare words in the plain text.

If the wheels worked independently, every combination would have the same probability.

(04-05-2021, 05:38 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.As described in the paper, Tiltman's table generates only 240 different word types, out of ca. 9500 needed.

This just means that your hypothesis is inconsistent with Stolfi's and Tiltmann's observations.

(04-05-2021, 09:08 AM)lurker Wrote: You are not allowed to view links. Register or Login to view.My point was that we do not observe independent choices.

(04-05-2021, 05:38 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Rare combinations of the parts can correspond to rare words in the plain text.

If the wheels worked independently, every combination would have the same probability.

The paper addresses two possible uses of the wheels:
1. to generate words "on the fly" and write them down as they appear. This is consistent with the original proposal of Gordon Rugg.
2. to build up a vocabulary of words, to be used to translate some plain text document

What you write is true in the first case, but not in the second case.
Effectively, the second case is much closer to what we see in the MS text.

To be more precise, your proposed 'flat' probability distribution would be true if the text were generated by a computer, but we know, of course, that it was generated by a human. So, it would not be entirely flat. This last point is an important aspect of the arguments of both Gordon Rugg and Torsten Timm.

(04-05-2021, 09:08 AM)lurker Wrote: You are not allowed to view links. Register or Login to view.
(04-05-2021, 05:38 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.As described in the paper, Tiltman's table generates only 240 different word types, out of ca. 9500 needed.

This just means that your hypothesis is inconsistent with Stolfi's and Tiltmann's observations.

I am not presenting a hypothesis. I am presenting a different way to look at, or analyse the text.

I also don't understand this comment.
Tiltman's observation is fully in line with the method presented. He just said that this covers "many words", not "all words".

Stolfi's observations are extremely important. They were input to Rugg's proposal, but it remains to be seen whether the n-wheels approach could generate his word grammar.
I never said that it could.

Coincidentally I've read his paper a couple of days back as part of my Cardan Grille research for another manuscript (for marco: You are not allowed to view links. Register or Login to view. and beyond)

and more on the paper from Gordon Rugg at his personal Voynich pages, where he also answers many questions. You are not allowed to view links. Register or Login to view.

I was thinking about what the 3 wheel spinning design reminded me of.

You are not allowed to view links. Register or Login to view.

Who knows, a medieval one armed bandit designed to generate meaningless text combinations?

(04-05-2021, 04:59 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I was thinking about what the 3 wheel spinning design reminded me of.

A disc with several segments would do in practice too Wink

(04-05-2021, 04:59 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I was thinking about what the 3 wheel spinning design reminded me of.

You are not allowed to view links. Register or Login to view.

Who knows, a medieval one armed bandit designed to generate meaningless text combinations?

Mark Knowles, that's a great example. In this sort of games each reel may have different number of certain symbol, for example the first reel can have 1 lemon and say 3 stars, but the second reel 2 lemons and 2 stars. Hence, different symbol combinations have different probabilities and these probabilities can be somewhat controlled. Also, since spins are independent, each next combination is independent of the previous one.

Moreover, there are machines with different configurations. You are not allowed to view links. Register or Login to view. has 3 lines. It generally works the same way, but... Combinations on line 1, 2, and 3 are not independent. And since in these games equal symbols on reels are usually placed adjacently, combinations on lines 1, 2, and 3 will be pretty similar.

Pages: 1 2 3 4 5 6 7

MarcoP

ReneZ

lurker

ReneZ

lurker

ReneZ

Davidsch

Mark Knowles

bi3mw

farmerjohn