The Voynich Ninja - Let there be meaning

Pages: 1 2 3 4 5 6

(23-10-2023, 02:00 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.A criticism of this method included that it was not considered realistic that many different plaintext words would, by chance, result in the same cipher text word (e.g. Eva chedy).

As long as cipher text word boundaries differ from plaintext word boundaries, I guess, these frequent chunks of cipher text could correspond to common subword sequences in the plaintext. E.g., the following are the 20 top frequent 4-letter sequences and their counts from Opus Majus, about 4% of the whole text are these patterns.

3515 quod, 2785 ibus, 2757 ione, 2562 tion, 2497 enti, 2380 orum, 2244 itat, 2216 atur, 1982 atio, 1913 tate, 1852 sunt, 1841 prop, 1730 itur, 1651 ntia, 1603 ntur, 1597 ndum, 1592 quam, 1518 cund, 1509 ient, 1509 ecun

It's still not nearly as many as in Voynich, Voynichese words are closer to the character bigram distribution for Opus Majus, where top 20 character bigrams correspond to 20% of the text and they are:

24249 er, 20968 et, 20924 in, 19784 um, 18400 nt, 18195 qu, 18013 it, 17621 es, 17516 ti, 17453 te, 17274 is, 15926 us, 15749 tu, 15271 en, 14949 ri, 14629 on, 14477 at, 13970 li, 13516 re, 13306 st

I'm not sure that this is what happens with Voynich though.

For distance ciphers in particular, sequences like y.okey.qokeedy.qokeedy.qokedy could be parsed as, for example, on the picture below, assuming that y/o/qo/d pair among themselves with k pairing separately (it's just one random arrangement, I didn't think it through) and eee being nulls. Which encodes a very normal looking English word sequence.

(If you didn't read my text on distance ciphers, minimum matching distance is often at least two characters wide to allow for easier interleaving of pairs, characters next to each other without a space never match up, e.g., like dy at the end of the first qokeedy below.)

(23-10-2023, 03:02 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.As long as cipher text word boundaries differ from plaintext word boundaries, I guess, these frequent chunks of cipher text could correspond to common subword sequences in the plaintext. E.g., the following are the 20 top frequent 4-letter sequences and their counts from Opus Majus, about 4% of the whole text are these patterns.

Fully agreed. However, a one-to-many cipher would also break up such correspondences.

(23-10-2023, 11:49 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Fully agreed. However, a one-to-many cipher would also break up such correspondences.

I think it's more like it could break up such correspondences. E.g., the following basic cipher from Vignère's Traicté des chiffres is a simple one table substitution cipher, that preserves all kinds of statistics from the text, however it has an alternative table that matches up letters with sequences of dots (or spaces of different widths, the author doesn't spell this out, but this cipher is described in the same section as pure distance/ruler based ciphers). So every 2-6 letters of the ciphertext one letter is encoded visually as a word break. Actual word breaks are not encoded at all. I showed the decoding process on this picture, I made it to make sure I fully understand myself this cipher in action. It's the general problem with old books on cryptography, they give a bit ambiguous description and then one example (with errors, usually) and leave it to the reader to figure out specific details. Trithemius was probably the trend-setter for this habit, or it was assumed that if a person cannot fill in the gaps, s/he has no business using ciphers anyway Smile

There are way too many quite different ciphers that could cause a mix of regular patterns and irregular overall structure like in VMS. I haven't yet found anything that statistically matches Voynichese, but so far no property of Voynichese that I'm aware of excludes a cipher of this general kind.

Full res of the image below is in this section of my text on distance based ciphers: You are not allowed to view links. Register or Login to view.

Also it's mentioned in this following section, where I talk about some properties of the text on f116v: You are not allowed to view links. Register or Login to view.

(I've noticed that GitHub doesn't behave well with section links when linking from an external page, like here. It would first scroll to the section header correctly, but then load the images and lose the location. Maybe tapping on a link, waiting for the page to load and tapping again can help with this, I'm not sure.)

Well, what you are describing here is a cipher that is very close to a simple-substitution cipher. Only one out of every several characters deviates from it. This is a way how groups can be preserved, i.e. not broken up.

But all of that isn't too relevant for the Voynich MS text.

To go from a Latin text to Voynichese, the bigram entropy has to be descreased.
The way to do that is NOT by adding more variation (varying ciphers). It is by reducing variation. We also need to create the word patterns in the cipher text, but introducing patterns helps to reduce entropy, so that is good news.

It does not help with the quasi-repetitions and all the stuf Patrick Feaster has presented, and I am not yet aware of any good way to achieve these two things.

Hi Rene,
I totally agree with your observations. Vignère presented its cipher in a concise, complete and clear way and that makes it possible to fully understand its properties.

I am more puzzled by the distance cipher illustrated here:

You are not allowed to view links. Register or Login to view.

[attachment=7798]

The 7 characters example does not show how the whole cipher is supposed to work. Also, T is encoded as yqo1, E is encoded as dqo1. But while y.qo appears consecutively in the cipher text, E is rendered as dy.qo, with the additional symbol y inserted in the sequence. I must be misunderstanding something.

Also, I don't understand how the sequence A S T should be encoded. I am leaving out nulls and representing 'qo' as 'q' (I understand that qo is treated as a single symbol?).

The encoding should be: kk2 kk3 yq1
but I am not sure how this should be handled. Something like:
k k . k y . q k
doen't work, since one gets kk4 instead of kk3.

[attachment=7797]

The alternative
k k . k y . k q
fixes kk3 but breaks yq1.

In general, this system seems to be quite complex to encode and decode. Anyway, I don't think that distance encoding decreases entropy or makes repeating words more likely. Of course, nulls (if added with fixed criteria, rather than randomly as effective cryptography requires) and even more so verbose elements (e.g. "qo" as a single symbol) do lower entropy. A cipher that makes large use of distance 1 sequences basically is a verbose cipher (e.g. encoding S as y.qo) and typically would reduce entropy, but I don't see the added value of the complexity of a proper distance cipher like this.

(24-10-2023, 01:22 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.The encoding should be: kk2 kk3 yq1
but I am not sure how this should be handled.

The encoding should be: kk2, kk3, yqo1.

This could work:
kain chek keey qok...

I'm simply counting the number of 'e'+1 between the left and the right letter of the pairs. Maybe I misunderstood the "distance" and other letters that are not part of any pair like a, i, n, c, h should be counted as well? Then it would force a "kek" that doesn't exist in the VMs. So I suppose these letters are just filler.

Thank you, interesting idea. I thought that nulls (e-sequences in this case) carried no information, but I am probably wrong.

Note that You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. don't have any EVA-e so this would be an unlikely way to count distances. Smile

Oh well, it's only an example.

The longest chain of qoke+dy, f75r.38:

Code:
qokeedy.qokeedy.qokedy.qokedy.qokeedy

+----+ qod3

  +-------+ kk3

      +--+ yqo1

             +---+ dqo1

              +-----+ yd1

                  +------+ kk2

                     +--+ yqo1

                           +---+ dqo1

                            +------+ yd3

(24-10-2023, 01:22 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.The 7 characters example does not show how the whole cipher is supposed to work. Also, T is encoded as yqo1, E is encoded as dqo1. But while y.qo appears consecutively in the cipher text, E is rendered as dy.qo, with the additional symbol y inserted in the sequence. I must be misunderstanding something.

For this particular scheme, there are two groups of characters: (o, q, y) and (k). Starting from the leftmost character in the message you proceed as follows: find the next character of the same group, asses the distance between two characters (1, 2 or 3, corresponding to ~ 2, 4 or 6 characters apart), using the distance and the pair identify the plaintext characters. Discard both characters and proceed. Since the minimum distance for a pair is about 2 characters wide, no immediately adjacent characters form pairs (two characters with no other characters or space between them never pair up, there is no such thing as distance 0 in the cipher). dy.qo is parsed as dqo1, because y goes immediately after d, and so can't belong to the same pair. I think I wrote something about this in the original post.

Quote:Also, I don't understand how the sequence A S T should be encoded. I am leaving out nulls and representing 'qo' as 'q' (I understand that qo is treated as a single symbol?).

Many possible ways.
1) Obvious, not very efficient: k...kk.....ky.qo, or k...k.........k.....k.....y.qo or any other way where sequences are just spaced out.
2) A bit more efficient, k...kky.qok
3) Using the property of adjacent characters never pairing up: kk.ykqok

Note that for comfortable reading for a handwritten script some stylistic variations (round/angled/slanted body, longer/shorter extending elements) can be used to let the reader skip terminating characters, without keeping track of them, or to provide visual clues for matching up the characters via specific angles or lengths or character elements. E.g., using italics to mark the terminating characters. Repeating the encodings above:

1) Obvious, not very efficient: k...kk.....ky.qo, or k...k.........k.....k.....y.qo or any other way where sequences are just spaced out.
2) A bit more efficient, k...kky.qok
3) Using the property of adjacent characters never pairing up: kk.ykqok

Or using extra cues (cedille here) to mark matching pairs of characters:

1) Obvious, not very efficient: ķ...ķk.....ky.qo, or ķ...ķ.........k.....k.....y.qo or any other way where sequences are just spaced out.
2) A bit more efficient, ķ...ķky.qok
3) Using the property of adjacent characters never pairing up: ķk.yķqok

Note that these adjustments are not required to properly read the cipher, they just help to read it much faster and write it with fewer errors.

We managed to encode 3 letters using only two different pairings (we could replace 'qo' with 'y' in all examples, getting kk.yk.yk for the last encoding) and 8 loci. Generally, this cipher roughly preserves the length of the plaintext in ciphertext, when using the same number of pair marks as the plaintext alphabet size.

Quote:In general, this system seems to be quite complex to encode and decode. Anyway, I don't think that distance encoding decreases entropy or makes repeating words more likely. Of course, nulls (if added with fixed criteria, rather than randomly as effective cryptography requires) and even more so verbose elements (e.g. "qo" as a single symbol) do lower entropy. A cipher that makes large use of distance 1 sequences basically is a verbose cipher (e.g. encoding S as y.qo) and typically would reduce entropy, but I don't see the added value of the complexity of a proper distance cipher like this.

I cannot comment on its complexity, but I expect that reading off the page at a speed of 2-3 characters per second should be possible after some training. Since the maximum pair distance is about 6 characters wide, it's short enough to not require saccades while reading, basically perceiving pairs as single entity.

This encoding does decrease observed character-based entropy, especially if distances (space counts) are not preserved in the transliterations. Compare kk.yk.yk and AST. It is verbose in the sense that at least 2 characters are required to encode one source entity. It is not verbose from purely information theoretical standpoint, since its density of encoding is comparable to the plaintext. A 9 codes x 3 distances version that I show in my article can encode a one letter of the source alphabet of 27 characters per code pair. With a script like Voynichese, where characters are easily split into basic elements, which support a large proportion of all possible combinations, it's could be possible to literally stack pairs on top of each other, and produce approximately 1 character on ciphertext per one character of Latin plaintext on average.

(24-10-2023, 03:38 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Thank you, interesting idea. I thought that nulls (e-sequences in this case) carried no information, but I am probably wrong.

If you mean the picture I posted for qokeedy.qokeedy interpretation, then nulls are nulls. They don't carry any information.

(24-10-2023, 04:53 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.The longest chain of qoke+dy, f75r.38:

Code:
qokeedy.qokeedy.qokedy.qokedy.qokeedy +----+ qod3 +-------+ kk3 +--+ yqo1 +---+ dqo1 +-----+ yd1 +------+ kk2 +--+ yqo1 +---+ dqo1 +------+ yd2

I didn't mean for my example to literally apply to the Voynich manuscript, but this one looks nice Smile

Pages: 1 2 3 4 5 6