The Voynich Ninja - k/t gallows reduplication

Pages: 1 2 3

Hi Marco,

(18-02-2023, 09:40 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.A few plots based on the Zandbergen-Landini transliteration (ZL_ivtff_2b.txt)

Did you remove the alternates ([:])? I made a few mistakes in the counts because there are some complicated cases that I failed to remove. Sad

Quote:I would like to compare these figures with actual linguistic texts. We know that these sequences are different from a randomly arranged text, but I am curious to see if they are as incompatible with natural languages as I expect.

The more general question is about all irregularities (inhomogeneities) between paragraphs, pages, sections. Some short patterns (1, 2, 3 EVA-letters) are frequent on some pages and absent on others. I am not sure which statistical metric (Chi² has some limitations) is best suited to measure how (un)likely the deviations are, and then compare to natural languages. Of course more statistical noise is expected on short samples (like paragraphs) so the metric has to take the size of the sample into account somehow.

(18-02-2023, 01:27 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Hi Marco,

(18-02-2023, 09:40 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.A few plots based on the Zandbergen-Landini transliteration (ZL_ivtff_2b.txt)
Did you remove the alternates ([:])? I made a few mistakes in the counts because there are some complicated cases that I failed to remove.

Hi Nablator,
I kept the first alternative, but I did this with a python script, not ivtt, so it's possible I missed the complicated cases as well. There are several [k:t] [t:k] ambiguities and from what I can see they were handled correctly. Anyway, this can only have a visible impact on the third graph.

The following are plots where I compare sequences of different lengths with their expected number in a randomly sorted text. As always, I may have made errors and things must be taken with a grain of salt. Each X value corresponds to sequences whose length is exactly X, e.g. 'kttktk' has 1 t-sequence with length=2, and one with length=1. I rendered a value of 0 as 0.1, so that I could use a logarithmic scale, which I find much more readable in this case.

Currier A and B (ZL transliteration, considering all text):

[attachment=7206]
[attachment=7205]

In the above plots, the values for length=1 and 2 are visibly lower than for a random distribution: this appears to be related with what Patrick wrote You are not allowed to view links. Register or Login to view.:

Quote:In You are not allowed to view links. Register or Login to view., I presented some statistical evidence that it's more likely overall for a [t] to be followed by another [t] within the next 4-6 glyphs, and for a [k] to be followed by another [k] within the same distance, than it is for a [t] to be followed by a [k] or vice versa. That also seems consistent with a tendency towards "blocks," for whatever it's worth.

I also checked a few texts in different languages, picking consonants that a vaguely similar number of occurrences has 'k' and 't' in Currier B.

[attachment=7207]

Shakespeare and Plinius (Natural History Book 2, prose) do not show much repetition and the curves basically oscillate around the expected value of 1; I include the results for randomly shuffling the words in Shakespeare's sonnets, which is not dramatically different from the original.
Dante's Commedia and (much more clearly) the Finnish Kalevala both show longer sequences than expected. Both texts are poems and (contrary to Shakespeare) make use of alliteration, in the Kalevala this feature is particularly apparent. Anyway, not even the Kalevala comes close to the repetition rates in the Voynich manuscript.

(17-02-2023, 06:42 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.In You are not allowed to view links. Register or Login to view., I presented some statistical evidence that it's more likely overall for a [t] to be followed by another [t] within the next 4-6 glyphs, and for a [k] to be followed by another [k] within the same distance, than it is for a [t] to be followed by a [k] or vice versa. That also seems consistent with a tendency towards "blocks," for whatever it's worth.

I should qualify this, since it's not quite accurate the way I expressed it here. What I was trying to study was the effect of one glyph on the probability of another glyph appearing a given number of positions ahead of it in the text, ignoring spaces. So, for example, I might count how many times the string [kedyqo] occurs, and then multiply that by the proportion of times the string [edyqo] is followed by [k]. My idea was that the result would predict how many tokens of [kedyqok] should occur if the presence of the first [k] has no effect on the probability of the second [k]. I would then compare this prediction against the actual token count of [kedyqok] to see if this sequence is more or less frequent than it "should" be.

Cases in which [k] is followed by another [k], or [t] is followed by another [t], were routinely more frequent than predicted:

k>edyqo>k: 109 actual, 94.09 predicted (~116%)
k>eedyqo>k: 165 actual, 156.84 predicted (~105%)
k>eyqo>k: 32 actual, 27.47 predicted (~116%)
k>eeyqo>k: 109 actual, 92.48 predicted (~118%)
k>edyo>k: 62 actual, 42.03 predicted (~148%)
k>eedyo>k: 41 actual, 37.52 predicted (~109%)
t>edyqo>t: 34 actual, 16.75 predicted (~203%)
t>eedyqo>t: 30 actual, 18.77 predicted (~160%)
t>eyqo>t: 3 actual, 2.87 predicted (~105%)
t>eeyqo>t: 13 actual, 9.14 predicted (~142%)
t>edyo>t: 29 actual, 25.29 predicted (~115%)
t>eedyo>t: 17 actual, 14.56 predicted (~117%)

Cases of alternation between [k] and [t] were routinely less frequent than predicted (although not always; exceptions are marked in boldface):

k>edyqo>t: 19 actual, 29.51 predicted (~64%)
k>eedyqo>t: 45 actual, 53.19 predicted (~85%)
k>eyqo>t: 4 actual, 5.73 predicted (~70%)
k>eeyqo>t: 22 actual, 27.43 predicted (~80%)
k>edyo>t: 35 actual, 42.03 predicted (~83%)
k>eedyo>t: 40 actual, 39.76 predicted (~101%)
t>edyqo>k: 42 actual, 53.40 predicted (~79%)
t>eedyqo>k: 47 actual, 55.35 predicted (~85%)
t>eyqo>k: 15 actual, 13.74 predicted (~109%)
t>eeyqo>k: 28 actual, 30.83 predicted (~91%)
t>edyo>k: 18 actual, 25.29 predicted (~71%)
t>eedyo>k: 13 actual, 13.74 predicted (~95%)

This is a little subtler than what I described hastily from memory -- it's not that [t] is necessarily more likely to be followed by [t] than [k] in terms of raw token counts, but that the probabilities are skewed in favor of a repetition of the same gallows glyph.

I also worked out statistics for all intervening strings of particular lengths ending with [o] (treating [ee] as a single glyph), with [*] indicating one intervening glyph, [**] indicating two intervening glyphs, and so forth.

Overall,

o>k = 30.31%

But:

t>**o>k = 17.75%; t>***o>k = 24.93%; t>****o>k = 33.04%
k>**o>k = 32.68%; k>***o>k = 39.18%; k>****o>k = 46.81%

Overall,

o>t = 16.83%

But:

t>**o>t = 29.02%; t>***o>t = 34.03%; t>****o>t = 28.03%
k>**o>t = 19.30%; k>***o>t = 21.86%; k>****o>t = 15.27%

Again, the cases in which the same gallows glyph recurs routinely outperform the cases in which gallows glyphs alternate.

Pages: 1 2 3