![]() |
|
Need advice for testing of hypotheses related to the self-citation method - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Need advice for testing of hypotheses related to the self-citation method (/thread-4765.html) |
RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 05-07-2025 (04-07-2025, 06:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.So even if there exists an input bit sequence (coin toss outcomes) that causes SCM to output an exact copy of the VMS (which I doubt, but let's assume it does), in order to show that the VMS "has no meaning" one would have to show that this magical bit sequence is not a meaningful message. There are two "little" problems with this that make it impossible: 1) Thinking in terms of bits and transducers would be anachronistic at a time when cryptography was at the mono-alphabetic substitution (with homophones and nulls) stage, 2) The bit sequence would need to be reconstructed from the VMS in order to recover the message. With multiple possible sources for each target word of the SCM it can't be done. You can't unscramble eggs.
RE: Need advice for testing of hypotheses related to the self-citation method - oshfdk - 05-07-2025 (05-07-2025, 08:06 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.(04-07-2025, 06:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.So even if there exists an input bit sequence (coin toss outcomes) that causes SCM to output an exact copy of the VMS (which I doubt, but let's assume it does), in order to show that the VMS "has no meaning" one would have to show that this magical bit sequence is not a meaningful message. We can replace bits with coin tosses and transducers with whatever concept similar to transducers existed in the XV century. I don't know what transducers are anyway. As for the unscrambability of the cipher, the below is a very simple (although quite verbose) self-citation cipher based on Voynich script. I think it is readable. Usual disclaimer: I don't think this is how VMS is encoded. This scheme is a pain to encode/decode, and I guess it's much more complicated than the actual cipher of the Voynich manuscript, but it shows that in principle it doesn't take advanced tech to encode text via self-citation. RE: Need advice for testing of hypotheses related to the self-citation method - Jorge_Stolfi - 05-07-2025 (05-07-2025, 08:06 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.There are two "little" problems with this that make it impossible: People back then were as smart as we are today. (Maybe smarter, since they had no TV or facebook...) "Bits" and "transducers" is only our modern way of describing the process. Just as we would say that the tables of Soyga are traces of a cellular automaton. But people have always understood those concepts in an intuitive way, without using those words. While complex encryption schemes were not standard common knowledge in the 1400s, as they would be in the 1500s, individual mathematicians would have been perfectly capable of devising encryption methods as complex as the "self-citation encoding" would be. From wikipedia, for example:
The SCM as encryption method is not much more complex than writing a sequence of numbers where each number is the location of a word in the Bible Quote:2) The bit sequence would need to be reconstructed from the VMS in order to recover the message. With multiple possible sources for each target word of the SCM it can't be done. Indeed the SCM as described is a many-to-one encoding of a bit sequence to a string of words. But what Thorsten and Timm observed is only that the VMS often repeats sequences of words that have occurred before, with variations. They then devised a method (the SCM) that generates random text with that same feature, while imitating a few other statistics such as the Zipf plot. But it does not follow that the VMS was created using the SCM! Back in my day, Gordon Rugg observed that Voynichese words can be split into prefix/middle/suffix where each segment is picked from a small set of choices. (This feature was noted by others before, e. g. Tiltman). He then devised a method, using a three-column table and masking cards with slots, to generate random text that (by construction) also displayed that feature. And he then he too "concluded" that the VMS was random gibberish with no meaning -- a "hoax". One can easily build a 3rd-order Markov word generator that produces random text very much like Shakespearean English. (That would be basically equivalent to the GPT model discussed in other posts here.) The Markov-generated text will have the same vocabulary size, Zipf plot, word and character entropy, and may other statistics as the real works of Shakespeare. Someone who does not know English would probably be unable to tel them apart. Does it follow that Shakespeare's plays are random gibberish without meaning -- a "hoax" too? So, if the VMS was an encrypted text, the actual encoding method could have been very different from the SCM, and easily invertible; but still happened to generate repetitions and near-repetitions similar to those observed. All the best, --jorge RE: Need advice for testing of hypotheses related to the self-citation method - Torsten - 05-07-2025 (05-07-2025, 02:23 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.(04-07-2025, 07:14 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.satisfied with an explanation of why "the whole thing cannot work" (dixit ReneZ). Short answer: The Self-Citation Method was developed by me through the systematic reverse-engineering of the word patterns identified in the Voynich Manuscript. I did also ask ChatGPT for a longer answer: From a structural and statistical perspective, self-citation naturally leads to both low entropy and characteristic word patterns. 1. What is the Self-Citation Method? It refers to generating new text by:
Entropy, in information theory, is a measure of unpredictability or information content. Self-citation with limited modification produces: - High repetition of sequences - Restricted "alphabet" or symbol combinations in context - Predictable transitions between words or fragments (Note: There are no structure changing modification rules like reordering of glyphs.) - A biased distribution of word lengths and glyph patterns The result: The text statistically mirrors a low-entropy system — similar to a compressed code, repetitive ritual text, or artificially constrained system. 3. Why Does That Produce Word Patterns? The Voynich manuscript exhibits:
4. Illustrative Analogy Imagine a program that:
Conclusion: The self-citation hypothesis does explain the most conspicuous aspects of the Voynich manuscript:
RE: Need advice for testing of hypotheses related to the self-citation method - Torsten - 05-07-2025 (04-07-2025, 06:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I would rather not spend much time studying the "self-citation method" (SCM), but maybe I can still say something useful. I don't believe the 'hoax' conclusion. The main reason I will explain in my 10-minute talk at the conference. The second main reason is that it impossible to prove that some string "contains no message". Or even to provide statistical evidence that would make such a conclusion more likely than not. I fully agree that proving a string 'contains no message' is practically impossible, especially in the absence of external context or a known key. However, that question is distinct from examining whether the structure of the text can be explained by a formal, self-referential generation process — such as the Self-Citation Method. The SCM is not inherently a 'hoax' theory, but rather a model that reproduces specific statistical and structural features observed in the Voynich Manuscript. It addresses how the text could have been generated, not necessarily why or whether meaning is present. In that sense, exploring such a model provides insights into the mechanics of the text — without requiring us to draw conclusions about its semantic content. RE: Need advice for testing of hypotheses related to the self-citation method - nablator - 05-07-2025 (05-07-2025, 10:53 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.But it does not follow that the VMS was created using the SCM! "It does not follow" is not a valid criticism of any theory... Induction is not deduction. It isn't worthless. When I created this thread I expected a discussion of how to do a proper (Bayesian?) hypothesis testing. We have a lot of evidence (the entire VMS), how do we assess/compare specific generating/ciphering/encoding methods with Bayesian inference? I know the basics but I've never done model selection / model fitting using Bayesian inference. By "advice" I mean: I'd prefer not to spend a month reading the literature and use a framework made by someone who knows how to do it. I don't care what ChatGPT "thinks" about it. I'm old-fashioned: I asked Google for pointers but it is not helpful enough, there is too much to read. Yes, I'm lazy. ![]() An introductory tutorial: You are not allowed to view links. Register or Login to view. RE: Need advice for testing of hypotheses related to the self-citation method - Jorge_Stolfi - 05-07-2025 (05-07-2025, 04:18 PM)nablator Wrote: You are not allowed to view links. Register or Login to view."It does not follow" is not a valid criticism of any theory... Induction is not deduction. It isn't worthless. Torsten&Timm's theory (TTT) is "the VMS is a hoax". Their arguments are (A) the VMS has certain repetition patterns that are not seen in a set of other texts they examined, and ( B) they devised a probabilistic generator whose output is like Voynichese according to some statistics, including similar repetition patterns. Pointing out that A&B does not imply TTT is a perfectly valid criticism of the TTT. If you prefer in Bayes: Code: Prob(A&B|TTT) * Prob(TTT)If A is true then B is true too, because the kind of repetitions that were detected are such that the SCM (or many other methods) can generate them. So A&B in in fact equivalent to A. The factor Prob(A&B|TTT) is therefore Prob(A|TTT), the probability that a hoax text will have the kind of repetitions that they detected. Not all hoax texts will have them. But let's be generous and say that Prob(A|TTT) = 0.5. The term Prob(A&B|not TTT) is therefore Prob(A|not TTT), the probability that a non-hoax text will have the kind of repetitions that they detected. Let's be pessimistic and say that it is 0.01 (1%). Then Code: 0.5 * 0.0001 0.00005Said more simply: if one does not believe a priori that the VMS is a hoax, knowing A and B will not convince them. RE: Need advice for testing of hypotheses related to the self-citation method - Jorge_Stolfi - 05-07-2025 By the way, it is important to keep in mind that repetitiveness (of any sort) is a property of the text, not of the language. One can write a text in perfect English with the same kinds of repetitions that T&T detected in the VMS. IN fact one can write a very meaningful text like that, RE: Need advice for testing of hypotheses related to the self-citation method - oshfdk - 05-07-2025 (05-07-2025, 07:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Said more simply: if one does not believe a priori that the VMS is a hoax, knowing A and B will not convince them. I'm never sure how to determine these priors. How to visualize the meaning of "my probability of VMS being a hoax is less than 0.01%"? Is it like if we had 10000 different manuscripts showing the strange properties of VMS, you'd say only one of them is likely to be a hoax? I find this too low. Given that people are known to make hoaxes, even though I find it highly unlikely that VMS is a hoax, I would still put my a priori probability of a hoax somewhere close to 10%, that is, about 1 in 10 strange 240 pages long unembellished medieval manuscripts with weird drawings written in an unknown script that have resisted modern attempts of deciphering it for more than a century might be a hoax. Which then would be: prob(hoax with T&T) = 0.5 * 0.1 / (0.5 * 0.1 + 0.01 * 0.9) ≈ 0.85 But somehow I'm still not convinced. (I think there is a typo there in your post, should be 0.0001 and 0.9999 in the denominator? It doesn't really change the result. I have no real experience with this formula, so maybe I don't understand something.) RE: Need advice for testing of hypotheses related to the self-citation method - Torsten - 05-07-2025 (05-07-2025, 07:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thorsten&Timm's theory (TTT) is "the VMS is a hoax". Their arguments are (A) the VMS has certain repetition patterns that are not seen in a set of other texts they examined, and ( B) they devised a probabilistic generator whose output is like Voynichese according to some statistics, including similar repetition patterns. No, that is a misunderstanding of my position. In fact, we explicitly acknowledge that A and B do not imply C. As we stated: "Most likely, it is impossible to devise an exact mathematical proof that an arbitrary set of strings is truly meaningless, or not. This would involve a general method to compute upper boundaries to the Kolmogorov complexity" [You are not allowed to view links. Register or Login to view.]. My theory is that the Voynich text was generated using the Self-Citation Method (B): "In the present work we have shown that a strikingly simple process for random text generation ('self-citation' algorithm) has the potential to resolve all of these seeming contradictions. The proposed text generation method is not only supported by many details of self-similarities uncovered in the VMS text, and is fully compatible with the historical background, but also even quantitatively reproduces the key statistical properties. In particular, we were able to demonstrate that our sample 'facsimile' text fulfills both of Zipf’s laws. Following Occam’s principle, this theory provides the optimal hypothesis available to explain all facts currently known about the VMS. It, however, does not totally dismiss the steganography hypothesis ..." [You are not allowed to view links. Register or Login to view.]. I argue (A) that context-dependent self-similarity features are a defining characteristic for the Voynich text and (B) that the Self-Citation Method is sufficient to explain these properties of the Voynich text. Pointing out that A and B do not logically imply conclusion C — namely, that the Voynich Manuscript is a medieval hoax — does not constitute a valid criticism of my theory, because my argument is focused on the mechanism of text generation (B) and its ability to account for the structural properties (A). Said more simply: conclusion C does not provide a basis for evaluating the validity of either A or B. |