The Voynich Ninja

Full Version: Need advice for testing of hypotheses related to the self-citation method
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(05-07-2025, 07:07 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Said more simply: if one does not believe a priori that the VMS is a hoax, knowing A and B will not convince them.

It might be possible to make Prob(A|not TTT) much lower with a more specific A, but "not TTT" includes an infinite number of theories, known and unknown. There doesn't seem to be a way to estimate the sum of all Prob(t)*Prob(A|t) without modeling all possible theories (t). Sad
(05-07-2025, 08:59 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.My theory is that the Voynich text was generated using the Self-Citation Method

That thesis (HS) is even stronger than what I denoted by TTT (just "the VMS is a hoax"), and you seem to be referring to as C. (Is this correct?)

Your proposition A ("context-dependent self-similarity features are a defining characteristic for the Voynich text") is similar to mine except for the word "defining"; but it does not seem to matter.  Proposition B seem to be the same: "[the output of the SCM] is like Voynichese according to some statistics, including similar repetition patterns [of the kind that were looked for].  OK.

Quote:[the SCM] also even quantitatively reproduces the key statistical properties. In particular, we were able to demonstrate that our sample 'facsimile' text fulfills both of Zipf’s laws
This is all part of proposition B.  It reproduces the "context-dependent self-similarity features" because it was designed to do so.  Satisfying Zipf laws, character entropy, etc is quite expected since it would preserve any properties of whatever text it was seeded with.  Like the output of a Markov chain would.

Quote:we explicitly acknowledge that A and B do not imply C
 
OK, so we agree on that.  Then, if B is excluded, the argument for your theory HS ("hoax by SCM") seems to be that it 
Quote:is fully compatible with the historical background [...]. Following Occam’s principle, this theory provides the optimal hypothesis available to explain all facts currently known about the VMS.

Theory HS only explains the "context-dependent self-similarity features" of the text.  It does not explain a thousand other properties of the book, beginning with why one would create that sort of book, invent that script, draw those figures, etc.  And it does not explain why the author would chose a seed text with all the other weird statistical and structural properties of Voynichese.   Theory HS implies that there must have been circumstances and motives that account for all that.  Once you make explicit the hypothesis that (HS0) "the situation was such that it led the Author to all those thousand choices when he set out to perpetrate his hoax", theory HS clearly fails Occam's test.

Quote:Said more simply: conclusion C does not provide a basis for evaluating the validity of either A or B.
But that is not the point.  We agree that A and B are true.  The question is the reverse: are A and B sufficient to make C (or HS) likely?
Back to the topic of the thread, I would be interested to see some statistics coming out of the planned analysis. There is already a qualitative sense that the biological B section is more repetitive than the others.

If it is possible to come up with a single 'indicator' value for each section, this should turn this into a more quantitative result. However, it may not be easy at all to define such a single indicator.

Again, my suggestion would be to start with some simple approaches (how many percent of words can be explained by n-tuple changes from M previous words). n-tuple just meaning: with Levensthein distance up to and including 'n'. 

This will also depend on the chosen transliteration alphabet, but less so (I suspect) than entropy.
Example comparing Eva and Currier:

chol and shol differ by one, as do SOE and ZOE
dain and daiin differ by one, as do 8AN and 8AM

After the simple approach, one can introduce all sorts of modifications and see what happens.
(edit)blah
, respecting the on-topic recall.
(05-07-2025, 11:20 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
Quote:Said more simply: conclusion C does not provide a basis for evaluating the validity of either A or B.
But that is not the point.  We agree that A and B are true.  The question is the reverse: are A and B sufficient to make C (or HS) likely?

My position is confined to demonstrating that A and B hold true. If you accept A and B as true, then I have successfully substantiated my argument.
(06-07-2025, 12:50 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.chol and shol differ by one, as do SOE and ZOE
dain and daiin differ by one, as do 8AN and 8AM
The question is how I view it.
In normal usage, for example, e + r = er. Sum in letters. 1+1=2.
But already in normal Latin usage, 8+9=89.
Sum in letters 2+4=3. (de unum = tum).
This is how it works in the dain/daiin systems to 8an/8am. 4/5 to 3/3.
How am I supposed to teach that to a programme?
(05-07-2025, 08:11 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm never sure how to determine these priors.
Indeed, that is the weak spot of Bayes's formula.  The result is highly dependent on the prior probabilties, and they are inherently subjective.

Quote:How to visualize the meaning of "my probability of VMS being a hoax is less than 0.01%"? Is it like if we had 10000 different manuscripts showing the strange properties of VMS, you'd say only one of them is likely to be a hoax?

OK, you are right, the formula as I wrote is not appropriate for the case of the VMS. I will try to fix it and post it later. 

Quote:I would still put my a priori probability of a hoax somewhere close to 10%, that is, about 1 in 10 strange 240 pages long unembellished medieval manuscripts with weird drawings written in an unknown script that have resisted modern attempts of deciphering it for more than a century might be a hoax.

Actually those features make it less likely to be a hoax. Between two paintings dated ~1500, one that that looks like a Leonardo painting, and one that is ugly, weird, badly painted on cheap canvas, and does not look like any other painting ever seen, which one do you think is more likely to have been created with the intent to defraud some rich art collector of the time?

And why do you say "1 in 10"? How many manuscripts similar (in any sense) to the VMS are known, and how many of them turned out to be hoaxes?  Shouldn't it be "0 in 1"? 

Quote:I think there is a typo there in your post, should be 0.0001 and 0.9999 in the denominator?

Indeed, thanks. Fixed.

All the best, --jorge
(06-07-2025, 04:48 AM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.The question is how I view it.

Please post in your own thread: You are not allowed to view links. Register or Login to view.
(06-07-2025, 12:42 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Actually those features make it less likely to be a hoax. Between two paintings dated ~1500, one that that looks like a Leonardo painting, and one that is ugly, weird, badly painted on cheap canvas, and does not look like any other painting ever seen, which one do you think is more likely to have been created with the intent to defraud some rich art collector of the time?

Yes, I've specifically listed the features that make it less likely to be a hoax. It is not embellished, it is long, it has no obvious attribution to some celebrity.

(06-07-2025, 12:42 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And why do you say "1 in 10"? How many manuscripts similar (in any sense) to the VMS are known, and how many of them turned out to be hoaxes?  Shouldn't it be "0 in 1"? 

As far as I know, there is only one manuscript with the same set of defining characteristics:
1) Long form
2) A lot of illustrations, almost all of them cannot be identified unambiguously (the only exception is the Zodiac figures, and even there there is some strangeness with how the sequence starts and why some months are split in halves)
3) Unique unknown script, total lack of unambiguously identifiable inscriptions in known languages, no other known examples of this script

And we don't know if it's a hoax or not, so it's ? in 1.

But we can talk about probabilities in a hypothetical scenario of finding a trove of 10000 assorted medieval manuscripts in an unknown script, resisting decipherment, etc, etc, all other characteristics of the Voynich MS. Which part of this collection we expect to be hoaxes?

1) There are reasons to make a hoax, and there are historical examples of hoaxes/forgeries
2) While the MS is long, it's possible to find a scenario which would call for a long forgery (say, VMS should have represented the original of a foreign manuscript of roughly known size)
3) If it was to represent a manuscript brought from a faraway land, total lack of inscriptions in European languages or scripts would make sense. Also, this might explain the short form and lack of embellishments, since it could represent a traveller's copy.

So, while all of these won't give us any exact number, they don't look extremely unrealistic to me, this is why I think 0.01% (1 hoax out of 10000 manuscripts) is way too low. 

My figure is 10%, because folk sociology assumes that only 10% of people will actively pursue fraudulent opportunities (this figure comes from the following article: You are not allowed to view links. Register or Login to view. and the source there is National Association of State Auditors, Comptrollers and Treasurers (NASACT) and the Oregon State Controller’s Division). I think it's possible to find more reliable sources, but this ballpark figure looks about right to me.

If we assume 10% of people familiar with the production of manuscripts were actively looking for iffy opportunities, it seems reasonable to assume that there might be about 10% of manuscripts with some kind of deception in their contents, made to increase their value. In my view, the specific features of the Voynich MS don't point towards it being a hoax, but they don't strongly point in the other direction either. So, 10% seems to be a good starting point to me.

Edit: I think I need to clarify the last sentence a bit. Yes, making a long forgery is harder and makes it look less plausible, and this should drive the percentage down. On the other hand, the same reasoning applies to other explanations. It's also much less practical to encode a whole long manuscript. And I think it's also not very practical to record foreign speech in an invented alphabet over hundreds of pages. Any theory proposing a careful methodical process for the Voynich MS has to explain why there was no better option for a long manuscript like this. And it's possible to provide explanations why it was necessary, and there are possible explanations for each of these scenarios. So, I don't believe the length of the volume substantially decreases the probability of a hoax in this particular case. As for the embellishments, maybe embellishments were supposed to be there (there are a few empty spaces in the MS reserved for them), but the work just wasn't finished, maybe the original forger left the project for some reason, and a crude paint job was performed instead of the planned embellishments.
(06-07-2025, 12:42 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(05-07-2025, 08:11 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm never sure how to determine these priors.

Indeed, that is the weak spot of Bayes's formula.  The result is highly dependent on the prior probabilties, and they are inherently subjective.

To apply Bayes one does not need a prior probability at all. Indeed, discriminating between prior and evidence is ihmo unnecessary and misleading. Start from a condition of absolutely zero knowledge, that is to say P(H) = 0.5 and P(notH) = 0.5 This is the only 'prior' one needs, and it's trivial. Then start factoring in every single piece of knowledge you have: they are all evidences now.

But it's true that, many times, also the probabilities of the evidence vs. H and vs. notH are hard to pin down, even broadly. There are however some methods which can be used (ie. Laplace's rule of succession, the reference class method) which minimize subjectivity, but it's not the case to discuss them (nor Bayesian logic in general) here.
Pages: 1 2 3 4 5 6 7 8 9 10