Options

Need advice for testing of hypotheses related to the self-citation method

Index
Need advice for testing of hypotheses related to the self-citation method
RE: Need advice for testing of hypotheses related to the self-citation method

Torsten > 12-07-2025, 12:42 AM

(07-07-2025, 09:34 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.In their argument T&T implicitly or explicitly assume that Prob(A|not H) is practically zero; that is, they assume that a manuscript that is not a hoax cannot have the "context-dependent repetitions" that they observed -- because they did not observe them in a few other non-hoax books that they analyzed. Conversely they claim that Prob(A|H) is much higher, because the hypothetical forger may well have generated the VMS using a method, like the SCM, that accidentally created such repetitions.

First, in our work, context-dependent self-similarity is the title of a chapter in which we describe a range of distinct and interrelated observations. The phenomenon encompasses far more than just "repetitions".

Second, I find your description of the Self-Citation Method (SCM) as having 'accidentally' produced repetitions somewhat misleading. The SCM was intentionally designed to reproduce the context-sensitive, self-similar structures observed in the Voynich Manuscript. These patterns are not incidental byproducts but rather the core objective of the method.

Lastly, could you clarify whether by 'T&T' you are referring to Timm & Schinner? If so, may I ask whether you’ve had the opportunity to read any of our published work in detail?

(07-07-2025, 09:34 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I don't see the SCM as a plausible answer to that question. The "self-citation" part is relatively easy to execute, but does not seem to be a natural choice for the hypothetical forger, and would require a non-trivial "warm-up" period to create a stable seed text that could then be used to start the VMS. But the "mutation" part of the SCM would require generating several coin tosses, with non-uniform probabilities, at each word. And these probabilities would have to be finely tuned in order to generate the proper Zipf plot and other "natural" properties.

Exactly — that's one of the key strengths of the Self-Citation Method. The observed statistical regularities, including the Zipfian distribution, arise organically from the method itself, without the need for any parameter tuning.
As we noted in Timm & Schinner (2019)
Quote:"We deliberately did not fine-tune the algorithm to pick an 'optimal' sample for this presentation. Such a strategy is by itself questionable. Nevertheless, an exhaustive scan of the parameter space (involving thousands of automatically analyzed text samples) verified the overall stability of the proposed algorithm. About 10-20% of the parameter space even yields excellent numerical conformity (≤ 10% relative error) with all considered key features of the real VMS text (entropy values, random walk exponents, token length distribution, etc.).

This demonstrates that the method is not only robust across a wide range of parameters but also capable of reproducing the key statistical features of the VMS without ad hoc adjustments.
RE: Need advice for testing of hypotheses related to the self-citation method

nablator > 18-07-2025, 01:06 PM

(05-07-2025, 10:19 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This scheme is a pain to encode/decode, and I guess it's much more complicated than the actual cipher of the Voynich manuscript, but it shows that in principle it doesn't take advanced tech to encode text via self-citation.

Nice idea!

I took the same criteria for word modification (max. 1 added letter, max. 1 removed letter, so 1 replaced letter counts as 1 added, 1 removed) and applied them to the "sequential transfer pattern" (preserving the order of words) from max. 2 previous lines of paragraphs on the same page, that I expected to be restrictive enough to yield possibly interesting results with little noise.

Results (number of target lines matching the pattern/total number of lines):
Q13: 39/777* (5%)
Q20: 15/1085 (1%)

* I removed labels and some short lines in and around illustrations

In Torsten Timm's generated_text (with 29 lines per page) this pattern is almost as frequent as in Q13:
53/1200 (4%)

So it is not as unlikely as I thought that this pattern would occur by chance.

Nothing to see there: it's just noise.
Next Oldest Next Newest

Need advice for testing of hypotheses related to the self-citation method

Index

RE: Need advice for testing of hypotheses related to the self-citation method

RE: Need advice for testing of hypotheses related to the self-citation method