The Voynich Ninja
[split] Verbose cipher? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: [split] Verbose cipher? (/thread-3356.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13


[split] Verbose cipher? - geoffreycaveney - 15-09-2020

(15-09-2020, 09:22 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.
(14-09-2020, 03:32 PM)RenegadeHealer Wrote: You are not allowed to view links. Register or Login to view.
(05-08-2020, 09:57 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.I feel fairly strongly that "ain" should be analyzed without the "d" because there are many glyphs preceding the "ain" sequence. Get a grasp of "ain" first and then look at the letters in front, but not only the "d", the other ones too, so the pattern can be understood in context.
I must say, it was pretty thrilling to read Koen's You are not allowed to view links. Register or Login to view. this summer about his experiment with ngrams, where one of the end results was some solid support for each in the series "a + 0~3i + line glyph" being one grapheme. I've been waiting for an experiment like that identifies ngrams with a high likelihood of being single graphemes ever since I read your old blog post about Janus pairs. I smell a breakthrough. I digress.
Indeed, Koen's blog post that you link to above is fascinating. (It should be noted that Marco contributed significantly to the results there as well.) Has that particular topic been discussed in its own thread on this forum?
Summary of the main result of Koen's blog post for those who don't have the time or inclination to wade through it:
If one selectively considers each of the following character groups as a single character or letter, one transforms the ms text such that both its "h1" character entropy and its "h2" conditional character entropy (one critical problem with the Voynich ms text) are much more in line with typical h1 and h2 values of many natural language texts, in particular for European languages:
[ch], [sh]
[ain], [aiin], [aiiin]
[air], [ar], [al], [am]
[or], [ol]
[ok], [ot], [od]
[qo], [qok], [qot]

These substitutions -- treating each of the above character groups as a single character or letter -- generate an "h2" conditional character entropy of 3.01 and an "h1" character entropy of 4.12, which are actually within the normal range for many European natural language texts.

The most surprising result of such an analysis of the Voynich ms script would be that EVA "[a]" and "[o]" would virtually always occur as part of the above bigrams/trigrams -- they account for virtually all occurrences of [a] and [o]. They would thus have no more independent significance than do "[c]", "[h]", "[i]", or "[n]". This is surprising upon first consideration.

EVA [l] and [r] would still have a restricted independent significance primarily where they occur in vord-initial position, without a preceding [o] or [a]. I calculate approximately 1,300-1,350 such examples of [l] and approx. 400-450 such examples of [r].

I also suppose that according to Koen's schema, [aiir], [oir], and [oiir] should also be treated as single characters or letters. Probably they were simply too rare in the ms text to affect the h1 and h2 values very much, so they weren't worth considering in Koen's initial calculations.

Taking account of the entire resulting inventory of the remaining single characters, plus all of Koen's bigrams, trigrams, and n-grams above, in this way we have produced a character "alphabet" of about 25 values:

[e], [y]
[d], [k], [t]
[s], [l], [r]
[ch], [sh]
[ain], [aiin], [aiiin]
[air], [ar], [al], [am]
[or], [ol]
[ok], [ot], [od]
[qo], [qok], [qot]

Of course there are a handful of additional rare characters/letters as well. But the above inventory should account for the overwhelmingly vast majority of the Voynich ms text.


RE: The daiin - Koen G - 15-09-2020

At the very bottom of the normal range, but yes, much closer to normal than the original EVA. 

I'm not sure whether this actually shows something about how the VM was encoded. But it is interesting to have this information. 

Either way, this approach veers away from a spontaneous, "naive" writing system and dives straight into verbose cipher territory. In earlier posts I was exploring what happens if you merge glyphs that may have been split by EVA, like [ch]. But all the examples involving [o] go further than that. I guess they take [o] as a modifier for the next glyph.

It must also be said that I simply took n-gram frequency as a guide, and did some trial and error to see what worked best (I attempted to reflect this process in the post). But entropy is an unpredictable beast, and it may be that there are more effective options and combinations...

Should I split this into a thread about verbose ciphers?


RE: The daiin - geoffreycaveney - 15-09-2020

Yes, a separate thread about verbose ciphers sounds appropriate at this point, to keep further discussion of this topic in line with the stated thread title.


RE: [split] Verbose cipher? - Koen G - 15-09-2020

One thing I'd like to hear more statistics-savvy people's opinion on is why some glyphs appeared more effective than others. For example EVA-e was always underwhelming...


RE: [split] Verbose cipher? - MichelleL11 - 16-09-2020

(15-09-2020, 11:46 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.One thing I'd like to hear more statistics-savvy people's opinion on is why some glyphs appeared more effective than others. For example EVA-e was always underwhelming...



I think that Farmerjohn's math is the answer to this -- although I am not such a mathematician to be able to explain it.  But I'm going to try anyway and don't laugh too hard everyone.
   

It seems to say that the impact of changing two letters into a single letter X will depend on the probability of both letters being substituted for (with two probabilities being equal and so having the value of p/2, will result in at least some positive impact on entropy) (e.g. in Koen's experiments the dot will move up the Y axis).

This increase is maxed out when each letter substituted in takes the place of two letters of equal probabilities (e.g. subbing out a four-gram with a bigram?) then entropy will increase by 1 (e.g., the dot in the experiment will move up the Y axis maximally).

And to make it even more ideal, if the substitution discussed above is performed AND both probabilities are "independent of context" (not sure that that means) you'll get an equivalent change in (h1-h2) (h1 alone? -- I'm confused on this point)* and thus won't lose your progress by messing with the X axis (e.g., the dot will move up the Y axis but won't move across the X axis).

Finally, trying to apply this to the EVA-e issue -- perhaps the probabilities of this letter are such that it cannot achieve a value of p/2 in relation to the other letters that will be substituted with it -- and thus it does not have a positive effect on entropy?

*I seem to recall that you graphed your results differently at different times although I had trouble finding it on your blog again -- Koen can you comment on this and does this fit into Farmerjohn's discussion?


RE: [split] Verbose cipher? - aStobbart - 16-09-2020

I found this old thread where this idea is discussed, with a comment from Nick Pelling:

You are not allowed to view links. Register or Login to view.


RE: [split] Verbose cipher? - DONJCH - 16-09-2020

Naively I can see how the conversion of o*, ai*n and qo* into bigrams/trigrams could well explain their unusually high frequency and also go some way to explain their positional "rigidity" as well.

So it's not "o" that is out front all the time (is it 60%?) but a variety of other letters, 5 "o*" in this scheme.

Question: could this indicate that the new "o*" "letters" are consonants?


RE: [split] Verbose cipher? - farmerjohn - 16-09-2020

MichelleL11, the fragment cited by you describes the situation when two cleartext letters are merged into the same ciphertext letter. So it's not applicable to geoffreycaveney's post about merging EVA-or or EVA-ol, but rather to his earlier post about merging p and b, t and d,... into the same ciphertext letters (in our case into Voynich letters).
This merging will decrease the entropy (which is logical, since number of symbols decreases).
However the difference (h1-h2) behaves differently. If probabilities of p and b are the same independently of the previous letter ("independent of context") then the impact of merging them is the same both for h1 and h2, so (h1-h2) will not change. The world is not ideal, but these are merging/splitting operation properties by default.

As for merging Voynich letters I think it's just trial and error method. One merging pair can produce somewhat predictable result, but the combination of merges is highly volatile. Not speaking about the logic: why qok and qot are being merged, but qop and qof are not?


RE: [split] Verbose cipher? - Koen G - 16-09-2020

(16-09-2020, 07:56 AM)farmerjohn Wrote: You are not allowed to view links. Register or Login to view.As for merging Voynich letters I think it's just trial and error method. One merging pair can produce somewhat predictable result, but the combination of merges is highly volatile. Not speaking about the logic: why qok and qot are being merged, but qop and qof are not?

Yeah it's not completely logical as a supposed encryption system. But I selected frequency as a primary guiding principle, and with a good reason. Anyone who has played around with manipulating entropy will know that for a change to have a decent effect, it needs to affect a decent number of cases. Merging a rare combination of glyphs might in theory be sensible, but it won't show up in your entropy values for the whole text.

So therefore, I focused on the big boys, and used trial and error. But the results aren't entirely random: why does [e] want to remain untouched? [ee, ey, eo, ed] were all part of the top 20, but merging them did not have the desired effects.


RE: [split] Verbose cipher? - ReneZ - 16-09-2020

One complication in this type of experimentation is that in plain languages, there will be some bigrams that are more frequent than some individual characters.

In the above sentence, there are five occurrences of 'th' but only one 'w'.
That is perhaps a special case, because 'th' is a single phoneme (actually, two different ones).
However, there are also four 're'.

Furthermore, there are probably many trigrams that are more frequent than bigrams and even trigrams that are more frequent than individual characters.

When one starts combining individual characters in the Voynich text, one can therefore never be sure if one is resolving a verbose cipher element, or compressing a frequent bigram.

Another point is the problem that one ends up with too large a character set. That can be resolved by assuming that certain forms refer to the same letter, e.g. (Eva) o=a, k=t, p=f, r=s etc.
While this may seem a 'dangerous' assumption, I believe it should be a necessary component of a 'verbose cipher' approach.