The Voynich Ninja

Pages: 1 2

(08-08-2024, 06:00 AM)Torsten Wrote: You are not allowed to view links. Register or Login to view.
(08-08-2024, 12:25 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The z--scores show how unusual the occurrence of a feature is in one position of a distribution.

That's the point. The Z-score is just a statistical measure that quantifies the distance between a data point and the mean of a dataset. Therefore Z-scores didn't allow conclusions like "[ey] and [edy] are followed by [qo] with the same preference". I depends on the distribution of the dataset what the difference between two Z-score is. With other words the difference between 3.5 and 3.7 might be that [.qo] occurs 1.5 times more often after [edy.] than after [ey.].

I think we're closer on this point than you suggest. So long as we know what kind of pattern/relationship interests us, then z--scores are fine, within their limitations. We could certainly be tighter with how we describe those relationships, but saying that the occurrence of [qo] after [ey] and [edy] is equally unusual still gets us something interesting: the token count in that position is (similarly) raised.

(To note, in the data I'm using, the token counts for [qo] after [ey] and [edy] is 723 and 1388, so an even bigger difference than the one you state. But the likelihood is 0.31 and 0.39 respectively, and the token count for [qo] immediately before the word is only 60% of the one immediately after. So we can see how the raw token count gives us a somewhat false impression, while the z--score picks out how exceptional this distribution is.)

(08-08-2024, 12:25 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.(Koen---or anybody with the authority---can I add a bigger file to this post? About 305 kb, but all text so no risk of viruses.)

Yeah that's no problem. I don't know what the file size limit is though, David manages those.

(08-08-2024, 09:24 AM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I think we're closer on this point than you suggest. So long as we know what kind of pattern/relationship interests us, then z--scores are fine, within their limitations. We could certainly be tighter with how we describe those relationships, but saying that the occurrence of [qo] after [ey] and [edy] is equally unusual still gets us something interesting: the token count in that position is (similarly) raised.

(To note, in the data I'm using, the token counts for [qo] after [ey] and [edy] is 723 and 1388, so an even bigger difference than the one you state. But the likelihood is 0.31 and 0.39 respectively, and the token count for [qo] immediately before the word is only 60% of the one immediately after. So we can see how the raw token count gives us a somewhat false impression, while the z--score picks out how exceptional this distribution is.)

Indeed, both occurrences are unusual, which speaks to their quality. However, the quantity differs significantly: [qo] appears more often after [edy.] than after [ey.] (likelihood 0.39 vs. 0.31). In other words, the Z-score is a specific number that indicates how far away something is from the norm. However, it's still necessary to examine the observation itself to fully understand it.(Note: Final-[ey] also occurs in Currier A, whereas final-[edy] is nearly absent. This discrepancy likely contributes to the differences in the statistics.)

Pages: 1 2

Emma May Smith

Koen G

Torsten