obelus > Yesterday, 12:44 AM
nablator > Yesterday, 01:55 PM
(Yesterday, 12:44 AM)obelus Wrote: You are not allowed to view links. Register or Login to view.An n-gram entropy can be estimated from the corresponding n-gram frequency distribution in a text sample (whatever the tokenization). When the text is reversed, there will be a one-to-one correspondence between n-grams in the forward and reverse directions (each "abc" forward will be represented by one "cba" in reverse, etc).
def ngram_logprob(word, counts, context_counts, tokenizer, n=3):
tokens = tokenizer(word)
logp, V = 0.0, len(counts)
for i in range(len(tokens)-n+1):
ngram = tuple(tokens[i:i+n])
context = ngram[:-1]
count = counts.get(ngram, 0) + 1
context_total = context_counts.get(context, 0) + V
logp += math.log(count/context_total)
return logp
def tokenize_plain_word(word):
"""Neutral: split into characters only, no prefix/suffix rules."""
return list(word)
nablator > Yesterday, 02:11 PM
(04-09-2025, 06:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.John Winstead's method reveals statistical differences between the start and the end of words:
Wikipedia Wrote:In head-marking languages, the adpositions can carry the inflection in adpositional phrases. This means that these languages will have inflected adpositions.
quimqu > Yesterday, 03:37 PM
RobGea > Yesterday, 07:00 PM
magnesium > Yesterday, 07:31 PM
(Yesterday, 03:37 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In the case of the Voynich, I think having a lower perplexity in the RTL direction is not really meaningful. Perplexity, which is directly related to entropy, measures how surprised the model is when it sees the actual next character (or word). We already know that the Voynich has unusually low entropy for bigrams compared to natural languages. So, in my opinion, a slightly lower entropy or perplexity in the RTL direction doesn’t add much insight. In fact, it might even be harder to explain why the text would have an even lower entropy than it already does.