The Voynich Ninja

Pages: 1 2 3 4

(17-09-2017, 10:15 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.
Quote:For instance the [q]-glyph is in 99,5% the first glyph in a glyph group and is followed in 97 % by an [o]-glyph.

OK, the first part of the statement doesn't make sense because 99,5% of words in the corpus aren't [q] AFAIK.

What I mean is that for glyph groups containing the [q]-glyph the [q]-glyph is used in 99,5% of the cases as first glyph.

Ok, that makes more sense. Understood.
However, that could easily be explained away by a phenomenon like that which existed in medieval Spanish, where an initial /k/ was represented as a Qu. This same sound in the middle of words was written as /k/ or /cu/.
Ie, Queda; cinCUenta.
The middle Q tended to be expressed only in Arabic words such as alquiler; alquimista; arquitectura; until the 19 th century spelling reforms.

It was an example for "almost" and "with rare exceptions" for the VMS.

There is nothing won if you can explain an example. You need an explanation for the observation the example stands for.

(17-09-2017, 09:53 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:For this reason the second order entropy for the VMS is lower then in other European languages.

To be strict, to this one should get a reply that the 2nd order entropy for the VMS is not actually the 2nd order entropy . In the first place - because we don't know the actual alphabet behind the observed script. I did not succeed, nonetheless, in "normalizing" h2 by way of decomposing or aggregating EVA symbols (I did not do plenty of checks, just a few most evidently suggestive). The other way, nonetheless, is the possible "expansion" of glyphs (say, such as o) into several underlay letters (say, Latin ab). This is something that is virtually impossible to machine-check due to the huge amount of possibilities. But that is something which moves us from the plain language to cipher.

Hi Anton,
if I understand correctly "expansion" will only increase entropy if it maps to different sequences. If you always replace, say o with [ab] the result will be even lower entropy (e.g. all low-entropy qo- sequences will be converted in even lower entropy qab- sequences). Is it so?

The other point that is unclear to me is why you say that this "moves us from the plain language to cipher". For instance, in Latin abbreviation, it was common to have symbols that expanded into more than one syllable. In the attached verse (from the bottom of You are not allowed to view links. Register or Login to view.), a “crossed p” is used for both “par” and “per”. Is it really necessary to think of a cipher to consider similar possibilities? Could we maybe say that this "moves us from a plain phonetic script to some kind of abbreviation"?

As a sidenote, because my toes curl every time when I read about "sounds" of letters:

it is not important what you compare/replace with what, as long as you have a way to check the patterns and compare them.

Such a method is useful when you want tot check simple or complex substitutions, such as phonetic substitutes, or use the word "sounds" if you want.

Hi Marco,

Quote:if I understand correctly "expansion" will only increase entropy if it maps to different sequences. If you always replace, say o with [ab] the result will be even lower entropy (e.g. all low-entropy qo- sequences will be converted in even lower entropy qab- sequences). Is it so?

Not necessarily.

In the first place, information entropy is a characteristic of the information source, not of any individual patterns that that source produces. In this view, it is not correct to speak of "low-entropy" "qo"-sequence or of "high-entropy" another sequence. You calculate entropy over the whole text - technically, over a sample of considerable length, so that the result of the calculation serves to represent the characteristic of the information source.

Second, h2 is the measure of mean information per character provided that the preceding character is given. So the result will depend on the total set of "expansions", not on any individual expansion considered per se. And also, on whether the symbols comprising the result of expansion do or do not appear in the original text.

For an example, let's take Davidsch's signature right above. The sample is too short, so the results will not be characteristic for Davidsch's written speech, but the maths are the same regardless of the size of the sample, so we'll observe the changes.

The original text (capitalization and punctuation removed for simplicity) is:

Code:
do with this posting what you want if you simply reply what do you mean i do not understand you i will not respond because then you did not read it well enough

For this text, h2 calculation is is 2.15.

Let's now "expand" the letter "o" into the sequence [xz], where neither letter "x" nor "z" are present in the original text:

Code:
dxz with this pxzsting what yxzu want if yxzu simply reply what dxz yxzu mean i dxz nxzt understand yxzu i will nxzt respxznd because then yxzu did nxzt read it well enxzugh

For this text, h2 calculation decreases to 1.97.

Now let us introduce another expansion - of the letter "h" to the sequence [xd], where both letters "x" and "d" are present in our preceding revision (and letter "d" is present even in the original revision):

Code:
dxz witxd txdis pxzsting wxdat yxzu want if yxzu simply reply wxdat dxz yxzu mean i dxz nxzt understand yxzu i will nxzt respxznd because txden yxzu did nxzt read it well enxzugxd

For this text, h2 calculation increases to 2.04 from the previous 1.97.

Quote:The other point that is unclear to me is why you say that this "moves us from the plain language to cipher". For instance, in Latin abbreviation, it was common to have symbols that expanded into more than one syllable. In the attached verse (from the bottom of You are not allowed to view links. Register or Login to view.), a “crossed p” is used for both “par” and “per”. Is it really necessary to think of a cipher to consider similar possibilities? Could we maybe say that this "moves us from a plain phonetic script to some kind of abbreviation"?

Abbreviation can be considered a cipher in its essence - a set of rules to convert original plain text into its representation.

Do I understand it correctly if I say the entropy in the last sample increases because there are more options for x?

The question here is: how do we know whether all of the three values are significantly different?

Koen:

There are now more options for the character following "x". First it was only "z" that could follow "x", now it's either "z" or "d".

Also, there are now more options for the character following "d". Instead of "o", "i" and space, it is now "x", "i", "a", "e" and space.

Rene:

In no way. As mentioned above, the example is just to illustrate the maths to show that entropy can change either way depending on the exact "expansions" that take place.

Pages: 1 2 3 4

Torsten

davidjackson

Torsten

MarcoP

Davidsch

Anton

Koen G

ReneZ

Anton