The Voynich Ninja

Full Version: Need advice for testing of hypotheses related to the self-citation method
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10
(04-07-2025, 12:24 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Again, you are adding complications which Torsten never included in his descriptions.
Does this mean that you have already seen that the basic approach does not work?

The approach doesn't have to be basic. The VM is a complex medieval book, not the output of a short computer program. Again, I don't care what settings, limitations and simplifying assumptions Torsten Timm's "basic approach" algorithm and app have as I don't intend to copy any of them.

The improvements that I propose are aimed to model (and minimize) the work of a human scribe on pages. They have measurable consequences and, I hope, important consequences on statistics. The selection of multiple close sources together and the initialization of a page from other pages should be modeled: a human would naturally prefer what is easier/faster, it's the principle of least effort. Also the generation process should not be carried out sequentially: there is no reason to assume that pages were generated sequentially one by one (they could have been written in parallel) and from the top to the bottom of each page, because we know that the VM's lines were not always written that way. These "complications" are all necessary. I don't know if they will result in a better fit to the VM than the output of Torsten Timm's app, I haven't done the work yet. I hope they will: there is definitely room for improvement and you should not reject the self-citation method because of the shortcomings of the "basic approach".

If the set of "seed" words was small and the rules didn't allow new words to be diverse enough on the first page (I don't know how many seed words and rules are needed and how many generations are allowed together on the same page) it may have looked bad. If not, there is no problem at all. So, if we are in the worst-case scenario, a totally speculative situation that we have no idea if it happened, we have the solution that Mauro outlined: everything evens out eventually in a multi-generation mix of generated words. We don't care how the VM started, because if the first page(s) looked very different from what we see now, they ended up in the trash. Pseudo-problem solved.
(04-07-2025, 12:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.In my considered opinion - and I have always been outspoken about this - the self-citation method is not a resonable 'solution' of the Voynich MS. 

I have expressed my doubts too on the self-citation method.


(04-07-2025, 12:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The argument that the initialisation is not a problem, it was just done 'somehow', for me is not satisfactory. 

It may not be the biggest problem (because I think that the whole thing cannot work), but it is being consistently swept under the carpet ;-)

Of course the seed string will have some effects, ie. if one starts with "xx zz bb" it's not a given at all that the x,z, and b will completely disappear (there are no rules to preferentially suppress those characters in Torsten's software). But surely if the seed string conforms to the rules, that is to say, if each of its words could have been generated by applying the rules to a null string, then the output texts will be statistically undistinguishable whichever seed string one actually used (and, in particular, even if one started from a null string).
(04-07-2025, 12:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Should they be at the centres around the most frequent words?

I don't see why this should necessarily be the case. Not being able to answer every question is not an argument for or against anything, so why bring it up?

The only interesting question for me is: could a model of how a human would plausibly do it with the self-citation method replicate most or all the known properties of Voynichese? I don't mean replicate the VM, so the exact set of seed words and rules is immaterial: the properties should be close, not the generated text. This question was not answered satisfactorily by anyone: the human factor in the process was not taken into account, many of the known properties were not replicated.
(04-07-2025, 03:03 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.The only interesting question for me is: could a model of how a human would plausibly do it with the self-citation method replicate most or all the known properties of Voynichese?

I would rather not spend much time studying the "self-citation method" (SCM), but maybe I can still say something useful.  I don't believe the 'hoax' conclusion.  The main reason I will explain in my 10-minute talk at the conference. The second main reason is that it impossible to prove that some string "contains no message".  Or even to provide statistical evidence that would make such a conclusion more likely than not.

One can only prove that a text contains no message if there is a short and deterministic program that generates the whole text verbatim, without any input, without any coin-tossing.  (And, even then, the text will still contain about as many bits of information -- "meaning" -- as are needed to code that program in some concise language.)

But that is not what the SCM authors have shown.  The SCM does not do that. IIUC, at every output word it tosses a coin to decide whether to start a new "self-citation" or take one more word from the current one. In the first case, it needs a bunch of bits (~14 on average for a VMS-size text) to decide at which past word it should start the citation. In the second case it needs another bunch of random bits to decide whether and how to mutate the next word.  

Even if this summary is wrong in some details, the important point is that the method is a probabilistic algorithm that needs several coin tosses per output word, on average.  Now, a probabilistic program is equivalent to a deterministic one that reads a stream of bits from an external source and pretends that they are the outcomes of the coin tosses.  Thus the SCM is not really a generator but a transducer that maps a stream of bits to a stream of words.  That is, an encoding method that can be used to encode a message -- by feeding it as the stream of "coin tosses".

So even if there exists an input bit sequence (coin toss outcomes) that causes  SCM to output an exact copy of the VMS (which I doubt, but let's assume it does), in order to show that the VMS "has no meaning" one would have to show that this magical bit sequence is not a meaningful message.  But there is no test that will show that a bit sequence that cannot be generated deterministically is "meaningless"...

And this is a problem with any claim that the VMS is "meaningless", not just the SCM.

All the best, --jorge

PS. Whenever my prof of automata theory at the U of São Paulo needed an integer for some example, he always used 418.  He explained that, in his experience, that was the best random number for such purposes.  I respect his authority in the matter, and for the last 50 years I have always tried to use that number whenever I needed a random multiplier, hash table size, test input, etc.; and I recommend it to students and co-workers.  Even in those occasions when that turned out to be a bad choice, I had no reason to suspect that it was not mere bad luck...
418 is an interesting number. But actually 33 too low.

To get an answer from automata theory where it belongs, you need 451.  Big Grin
(04-07-2025, 06:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.And this is a problem with any claim that the VMS is "meaningless", not just the SCM.

I didn't agree with the conclusion of the 2019 article either: it's a bit confusing, not actually claiming that the VMS is meaningless. Torsten Timm answered my questions here: You are not allowed to view links. Register or Login to view.

Meaningful or not, I would be satisfied with an explanation of why "the whole thing cannot work" (dixit ReneZ).
(04-07-2025, 07:03 PM)Aga Tentakulus Wrote: You are not allowed to view links. Register or Login to view.To get an answer from automata theory where it belongs, you need 451.  Big Grin 
You mean 232.8.  We are metric down here.  Wink
(04-07-2025, 06:32 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.One can only prove that a text contains no message if there is a short and deterministic program that generates the whole text verbatim, without any input, without any coin-tossing.  (And, even then, the text will still contain about as many bits of information -- "meaning" -- as are needed to code that program in some concise language.)

PPS.  A nice example of the above are the letter tables in the Book of Soyga, which Dee owned and was obsessed with.

Most of the book is about magic, kabbala, gematria, etc.  The last section has 36 tables, each a grid of 36 by 36 squares. Each square contain a single letter of the 23-letter alphabet.  On some tables letters seem to have some puzzling patterns, but overall they seem random.  For ~500 years they resisted any attempts at decipherment, and (like other "uncrackable" codes) were often claimed to be random jumbles with no meaning.

But in ~1998 eminent cryptographer and Voynichologist Jim Reeds solved the mystery.  It turns out that the contents of each table is generated by a completely deterministic algorithm, starting with a 6-leter seed word. That word and its reverse are repeated 3 times to fill the first column. Then every square in the other columns is computed by a simple formula from the two letters immediately above and to the left.  In modern jargon, the columns of the table show the evolution of the state of a 1D cellular automaton with 23 states per element. 

Thus the 36 tables together are a 36x36x36 = 46'656 letter text which is demonstrably almost meaningless: its "meaning" is only the 36 seed words (~21 bytes of information) plus the formula (say 50 bytes in some suitable machine pseudocode).

All the best, --jorge

PPPS. Re-reading Reeds paper, I found this interesting tidbit about the handwriting of the letters in the tables:
  • "The writing becomes more even after the first few tables, with greatly diminished use of upper case letters, as if the copyist became accustomed to what must have been an unusually irksome and tedious task of copying completely senseless data which offered no obvious contextual clues for correcting mistakes."
Reminded me of a certain other "encrypted" manuscript...  Big Grin
(04-07-2025, 07:51 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.We don't care how the VM started, because if the first page(s) looked very different from what we see now, they ended up in the trash. Pseudo-problem solved.

Not solved but swept under the carpet (classical example) Wink 

But never mind. 

Modern people who have a background in maths, engineering or software are trained to think in (or quickly recognise) systems and structures. I belong to this group and I think most protagonists here are.
In fact, most of the suggestions that are being made here clearly demonstrate this. We can see diagrams and flow charts without having to draw them.

When talking about anything that somehow involves recursion, this system thinking automatically forces me to worry about the initialisation. A description of the system without the initialisation is incomplete.

Using system / structure analysis is useful for analysing the text of the MS, but one has to be very careful when projecting this on the creators of the text. Some things could be conceivable, others not.

Even the agnostic approach that there was some unknown initialisation that ended up on a sheet of text that is no longer part of the MS has the problem that this is key tot the whole method. It cannot have been skipped. It set up the alphabet that was used (mostly) consistently over the entire text. 

I'll leave it at that  Smile
If I am the only one thinking about it, so be it. 
Looking forward to any results of the analysis
(04-07-2025, 07:14 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.satisfied with an explanation of why "the whole thing cannot work" (dixit ReneZ).

Very brief:

The high level of repetitions is not the most conspicuous artifact of the text. Perhaps not even the second most conspicuous, but probably the third. This is so subjective that I don't want to argue about it.

Using this as the prime method for the text generation, while it does nothing towards the most conspicuous aspects (low entropy, word pattern - same thing really) is my biggest problem.

But this risks getting us off topic.
Pages: 1 2 3 4 5 6 7 8 9 10