![]() |
[Article] Is Voynich a right-to-left script? - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: News (https://www.voynich.ninja/forum-25.html) +--- Thread: [Article] Is Voynich a right-to-left script? (/thread-4909.html) Pages:
1
2
|
RE: Is Voynich a right-to-left script? - obelus - 05-09-2025 If "perplexity" is in vogue for language models, fine... an earlier thread explained that perplexity is monotonically related to the Shannon entropy. If so, these directionality results are puzzling. An n-gram entropy can be estimated from the corresponding n-gram frequency distribution in a text sample (whatever the tokenization). When the text is reversed, there will be a one-to-one correspondence between n-grams in the forward and reverse directions (each "abc" forward will be represented by one "cba" in reverse, etc). Therefore any function that can be calculated from n-gram frequencies alone, such as "absolute" entropy, or conditional entropy to any order, must be the same in either direction. How does perplexity break this symmetry? If the text samples were not globally reversed, character by character as claimed, then statistical asymmetry may be have been introduced when correlations at the word or line level were altered. RE: Is Voynich a right-to-left script? - nablator - 05-09-2025 (05-09-2025, 12:44 AM)obelus Wrote: You are not allowed to view links. Register or Login to view.An n-gram entropy can be estimated from the corresponding n-gram frequency distribution in a text sample (whatever the tokenization). When the text is reversed, there will be a one-to-one correspondence between n-grams in the forward and reverse directions (each "abc" forward will be represented by one "cba" in reverse, etc). I wanted to post exactly the same question yesterday... how can there be a (global) "n-gram perplexity asymmetry"? But I did not understand the code enough to comment. Now I understand it better. tokens = [t for t in re.findall(r"[A-Za-z0-9]+|[\u0590-\u05FF]+|[\u0600-\u06FF]+", line)] This extracts word tokens allowing the Latin alphabet, Arabic numerals and Unicode ranges for Hebrew (\u0590-\u05FF) and Arabic (\u0600-\u06FF). These Unicode ranges should be improved, they include punctuation: You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view. Code: def ngram_logprob(word, counts, context_counts, tokenizer, n=3): Note: tokenizer actually splits words into characters: Code: def tokenize_plain_word(word): If I understand correctly this sums the log of the probability of ngram in the context of ngram[:-1] (this removes the last element, thus the asymmetry) in each word. V seems useless (adds 0.0). RE: Is Voynich a right-to-left script? - nablator - 05-09-2025 (04-09-2025, 06:43 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.John Winstead's method reveals statistical differences between the start and the end of words: I imagine this method only works because the tested languages use suffixes more than prefixes for You are not allowed to view links. Register or Login to view.. There may be some (?) exceptions: Wikipedia Wrote:In head-marking languages, the adpositions can carry the inflection in adpositional phrases. This means that these languages will have inflected adpositions. RE: Is Voynich a right-to-left script? - quimqu - 05-09-2025 In the case of the Voynich, I think having a lower perplexity in the RTL direction is not really meaningful. Perplexity, which is directly related to entropy, measures how surprised the model is when it sees the actual next character (or word). We already know that the Voynich has unusually low entropy for bigrams compared to natural languages. So, in my opinion, a slightly lower entropy or perplexity in the RTL direction doesn’t add much insight. In fact, it might even be harder to explain why the text would have an even lower entropy than it already does. RE: Is Voynich a right-to-left script? - RobGea - 05-09-2025 Nevermind RE: Is Voynich a right-to-left script? - magnesium - 05-09-2025 (05-09-2025, 03:37 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In the case of the Voynich, I think having a lower perplexity in the RTL direction is not really meaningful. Perplexity, which is directly related to entropy, measures how surprised the model is when it sees the actual next character (or word). We already know that the Voynich has unusually low entropy for bigrams compared to natural languages. So, in my opinion, a slightly lower entropy or perplexity in the RTL direction doesn’t add much insight. In fact, it might even be harder to explain why the text would have an even lower entropy than it already does. One potentially useful test would be to assess the perplexity of a Naibbe ciphertext in LTR and RTL directions. In this case, the cipher is very consciously mimicking the VMS word grammar (the driver of the VMS’s entropy) while encrypting a left-to-right Latin or Italian plaintext. RE: Is Voynich a right-to-left script? - Labyrinthinesecurity - 06-09-2025 I have updated the notebook to calculate the Gini index, and reproduced John W’s findings on english (positive delta Gini) and Hebrew (negative) For Voynich it is unconclusive (delta zero), meaning Voynich fails the Ashraf &Sinha hypothesis, unlike any other tested language So perplexity seems a better way of measuring directionality, for Voynich at least (04-09-2025, 05:49 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Why have you chosen perplexity as your only statistical measure? |