The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13

(16-09-2020, 08:35 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.When one starts combining individual characters in the Voynich text, one can therefore never be sure if one is resolving a verbose cipher element, or compressing a frequent bigram.

Definitely, and I think there is no way to solve this issue. If I were to write English "a" as "aµ", there would be no way to distinguish this from English "qu"; however, one is a verbose cipher element while the other is a frequent bigram. Since "qu" is basically the only way "q" appears, it acts very much the same.

About the character set, I have the same ideas. However, the only test I did so far is change [y] into [a]. This increases h2 with a decent amount. This may be counter-intuitive since you remove variation in characters, but this variation was extremely predictable. So you end up with one new character that appears in more contexts.

The texts in my comparison corpus have had capitalization removed, but if this weren't the case, their h0 would be much higher as well. The solution here would also be to equate characters (upper case to lower case) which I did in pre-processing.

However, I see no way this can be tested in Voynichese. The exercise would be more "random" than merging n-grams by frequency.

(15-09-2020, 10:05 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.[e], [y]
[d], [k], [t]
[s], [l], [r]
[ch], [sh]
[ain], [aiin], [aiiin]
[air], [ar], [al], [am]
[or], [ol]
[ok], [ot], [od]
[qo], [qok], [qot]

Does this mean that (e.g.) [qo] [qok] and [qot] are the same letter, or they are three different letters?

All are different letters. Of course, equating them might be a next step as I discussed above with Rene, but I don't know by which criteria.

(16-09-2020, 07:56 AM)farmerjohn Wrote: You are not allowed to view links. Register or Login to view.MichelleL11, the fragment cited by you describes the situation when two cleartext letters are merged into the same ciphertext letter. So it's not applicable to geoffreycaveney's post about merging EVA-or or EVA-ol, but rather to his earlier post about merging p and b, t and d,... into the same ciphertext letters (in our case into Voynich letters).

Thank you for pointing out my misunderstandings - l’m going to keep trying to get this!

On the current topic, how frustrating that there is no way to predict the entropy impact of differing the organization of the ciphertext.

It makes sense to estimate how the vocabulary is reduced through such transformation, would it look realistic.

Also, grouping stuff like [ot] and [ol] into single letters yields e.g. otol or aror or qokaiin etc. as two-letter words, in short we'll have a vocabulary with plenty of short words, so while you normalize the entropy to European languages, you're moving away from them in terms of average word length.

Just as the gut feeling, I'd look into letter-sorting. The entropy is just one tale of the pack, there's also the positional affinity of glyphs. Letter sorting would reduce entropy without affecting word length. I'm not saying there's nothing beside letter sorting, the gallows behaviour speaks for itself, but certain sorting may be in place.

Could the function of the glyph pair (e.g. verbose cipher component vs frequent bigram) be some how reflective of the impact it has on entropy? That is, could removing one or the other “kind” of pair be associated with a particular direction of entropy change (or lack thereof)?
In other words, are we completely certain that these two functions would not be distinguishable through the statistics?

(16-09-2020, 02:27 AM)aStobbart Wrote: You are not allowed to view links. Register or Login to view.I found this old thread where this idea is discussed, with a comment from Nick Pelling:

You are not allowed to view links. Register or Login to view.

Both Julian's post and Nick Pellings's comment are quite informative.
In particular, Nick points out what was mentioned by Anton: interpreting Voynichese as a verbose system leads to words that are shorter than in European languages. As far as I know, this can mean three different things all of which have been discussed by Voynich researchers:

1. Words are abbreviated / truncated / shortened in some way

As Nick says in that comment, words can be abbreviated. I don't think that in this case abbreviation can be the same as in Latin ms, e.g. -y=-us, because this would basically undo what was done by accepting the verbose cipher assumption; if single cipher symbols are interpreted as multiple plain-text symbols (abbreviation) you do the opposite of interpreting multiple cipher symbols as single plain-text symbols (verbose cipher). I guess that VerboseCipher + LatinAbbreviations would put entropy back to below 2.5. From Nick's comment I'd say he had something different in mind: e.g. -y as truncation could just mean "something is missing", without giving any hint on what is missing. This of course can lead to considerable ambiguity, at least for languages that convey much of the information in suffixes.

2. Voynichese words do not correspond to plain-text words

As Julian suggested, Voynichese spaces can be something different from plain-text word spaces. I think this is also how Rene's experiments mentioned You are not allowed to view links. Register or Login to view. work (but I cannot find his explicit mention of the assumed irrelevance of word spaces).
This idea has the problem (quite serious, in my opinion) of suggesting that Voynichese labels are meaningless, or at least cannot be full words. They could be something like letters A, B, C that are used as reference in the text, but many labels are hapax legomena, so this solution has its problems too.
The only example of a historical verbose cipher I am aware of (the text discussed by Derolez in the quote You are not allowed to view links. Register or Login to view.) clearly separates digraphs that correspond to individual characters, but words are also separated, by either a dot or a longer space. It would definitely be interesting to see more examples of actual medieval verbose ciphers, so that we knew what people actually did.
[attachment=4770]

3. Voynichese is a monosyllabic language (e.g. Vietnamese, Chinese)

As discussed by Guy and You are not allowed to view links. Register or Login to view., the distribution of word lengths in the dictionary could point to a monosyllabic language like Vietnamese. These languages have considerably shorter words than European languages, so the average word-length could match the output of Voynichese "compression" based on a verbose system.

(16-09-2020, 12:12 PM)MichelleL11 Wrote: You are not allowed to view links. Register or Login to view.Could the function of the glyph pair (e.g. verbose cipher component vs frequent bigram) be some how reflective of the impact it has on entropy? That is, could removing one or the other “kind” of pair be associated with a particular direction of entropy change (or lack thereof)?
In other words, are we completely certain that these two functions would not be distinguishable through the statistics?

It probably depends. I suspect something like English qu would be indistinguishable because q is always followed by u. Reducing it to a single glyph would increase entropy, since you'd remove a whole lot of predictability.

If, however, it is some other frequent glyph pair like [ou], the increase in entropy might be less. This is just a guess though.

What you need is a little programme which lets the user chose which glyphs to combine and automatically recalculates entropy each time. Everybody could then experiment with the best combinations and what effects entropy the most/least.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13

Koen G

Anton

Koen G

MichelleL11

Anton

Anton

MichelleL11

MarcoP

Koen G

Emma May Smith