The Voynich Ninja

Full Version: More rigorously testing the hoax hypothesis
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
(10-01-2023, 10:45 PM)degaskell Wrote: You are not allowed to view links. Register or Login to view.we recruited human participants to produce real, handwritten samples of meaningless text and compared them statistically to Voynichese.
If the VMS is an hoax, how was the process of creation? Was it created in a session or was a longer process, something like a learning curve? A process with corrections until they considered that the result was acceptable.  An experiment to reproduce a quick creation may be possible, but if it is not the conditions to reproduce the creation process will be very difficult to recreate. 
If the conditions of the experiment differ from the conditions of the creation of the VMS, how to compare both texts?
(10-01-2023, 10:45 PM)degaskell Wrote: You are not allowed to view links. Register or Login to view.how to more rigorously test whether the Voynich is meaningful or not.

I thought about this,  then searched Wikipedia,  then my brain exploded Flower

In conclusion:
    i dont think it is possible.
Except:
     maybe if you use some formal definitions and some bad boy mathematics you could quantify something.
     But even then the Linguistics crowd and the Philosophers would all have something to say about it.
This is a complex question, and I cannot do it justice with just a short post here.

Your (Daniel) question can be interpreted in two ways:
- how could the experiment you did be improved to be more rigorous
- how can one generally decide whether any given text is meaningful

I guess that you are open to both.
As the paper already points out, a main shortcoming of the experiment was the limited length of the generated texts, which is of course more or less inevitable.
 
However, it is an important aspect that the Voynich MS text is very long, and it is consistent in many respects. It is not certain, but I consider it extremely likely, that the alphabet (or the writing system as a whole) was designed specifically for this book. This then shows that there was a singificant amount of planning and intention behind it.
Thus, if we cannot yet decide that it has meaning, I dare say that it is the product of intention, whatever that is worth.

A linguist from the earlier days of the internet, Jacquest Guy, has given this matter considerable thought, and he came to the conclusion, as several posters above, that we cannot decide if a text has meaning.

My own brief thoughts have been along thought experiments like the following:

1) Take a known plain text, which can be quite long
2) Create a list of words that appear in this text, according to decreasing frequency
3) Put the list of Voynichese words sorted by descreasing frequency next to it
4) Substitute the plain text words by Voynichese words according to this table .

This results in a text using valid Voynichese words that has a meaning.

Now the question: how can we detect that this has meaning?  ( = Question 1)

Next, we do the same, but after step 1, we modify the meaningful text by scrambling the words arbitrarily. As a result, this new text has become meaningless.

This process results in a text using valid Voynichese words that has no meaning.

So question 2 is: how can we detect that this text has no meaning?

And question 3 is probably the better one: what difference between the two texts can we quantify, such that we can detect meaning?

Of course, this is a specific type of meaningless text, which can be converted to a meaningful text by a specific 'unscrambling' of the words.

For me, one way of looking at this question is: is it possible to create a dictionary of Voynichese to "any-known-language", such that a word for word substitution results in a meaningful text?
This is what almost all would-be solvers are tacitly assuming. It implies that the word spaces are indeed word spaces, which is another big question, and which is generally (tacitly) assumed.
Anyway, if the answer to the above question is yes, then the text is meaningful. One can just create the dictionary, and the whole question about word structure and entropy is swept under the carpet.
If the answer is no, it is still possible that the text is meaninful, but it becomes even more difficult.

Back to the earlier thought experiment, instead of fully scrambling the original text, one could do a relatively small number of word swaps over short distances, and the text might still remain meaningful. One may then start to wonder at which point meaning disappears, and come to the (obvious) conclusion that the concept of "meaning" is quite vague, even subjective.
(14-01-2023, 03:33 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.My own brief thoughts have been along thought experiments like the following:

1) Take a known plain text, which can be quite long
2) Create a list of words that appear in this text, according to decreasing frequency
3) Put the list of Voynichese words sorted by descreasing frequency next to it
4) Substitute the plain text words by Voynichese words according to this table .

This results in a text using valid Voynichese words that has a meaning.

Now the question: how can we detect that this has meaning?  ( = Question 1)

A possible approach to this problem was described by Tony C. Smith & Ian H. Witten ("Language inference from function words", 1993, mentioned You are not allowed to view links. Register or Login to view.). The paper is old enough to still be understandable (nothing as powerful and obscure as GPT). Basically, the 1% most frequent word types in a longer text are most likely function words. Function words belong to closed classes, i.e. their number is finite and small (new pronouns, articles, conjunctions etc. only appear through the centuries, in the context of the historical shift from a language to the other). One can start from here and iteratively cluster word types into grammatical clusters (e.g. functionWords-> pronouns-> singularPronouns-> masculineSingularPronouns-> accusativeMasculineSingularPronoun [=him]) and at the end maybe define a possible mapping to words in a source language and give a positive answer to Question 1.

I know that most people on this forum are aware of this, but maybe it is worth mentioning - the basics of Smith-Witten's method are enough to show that Voynichese does not conform to Rene's thought experiment:

1. Each section has its own set of 1% most frequent words; the change in lexicon appears to be a continuous drift and the extreme Currier-A and Currier-B sections (HerbalA and Bio/Q13) only show a minimal intersection in the sets of most frequent words (see You are not allowed to view links. Register or Login to view.), while in all languages function words tend to be uniform in different texts.
2. In all sections, many of the most frequent words often appear consecutively (e.g. "daiin daiin" in HerbalA, "chedy chedy" in BioQ13) and this is not the case for any European language.


Another way to tackle this subject is MATTR (a method amply discussed by Koen and more recently by Luke Lindemann at the Malta conference). The though experiment does not affect MATTR (just like a simple substitution does not affect conditional character entropy). A text created through this method will perfectly match MATTR values from the source language. But, if you take small MATTR windows, Voynichese does not fit much (see You are not allowed to view links. Register or Login to view.).
Hi Marco, thanks for this. I remember your earlier post, but had not thought of it in this context.

Let me try to understand.
In my two example texts (one derived from a meaningful plain text, the other from a fully scrambled text), the first step: finding likely function words, would lead to exactly the same result. The fact that the second is meaningless is not detected, and that is due to the fact that it still has some 'meaning' hidden deeply inside it.

It then depends on the next step: clustering word types, whether this meaninglessness can be detected. This would require that the metod is taking the distance between words into account. If it does not, it will still consider the scambled text just as meaningfull as the original one. I don't know the answer to this.

With respect to MATTR, the scrambling should be clearly visible in the result. Ideally the curve should be flat with some random noise on top. However, a "non-flat" MATTR does not indicate meaning, of course.

I really can't remember if this was tested at the time when MATTR was discussed here.

What the experiments presented at the conference show is that human-generated meaningless text does not appear random. This is not unexpected. Any test for 'meaning' should be able to distinguish human-generated meaningless text from computer-generated random text.
In this context, text generated by Torsten's app would be an interesting test object, because its algorithms generate something that (to some extent) follows Zipf's law (which is another inadequate test for meaning).
The two general approaches outlined above (MATTR and vocabulary-matching) seem promising, but I think they both suffer from the problem that they are fundamentally measures of structure rather than measures of meaning. It's easy enough to take structure as a reasonably proxy for meaning, but as our experiments show, it is quite possible for a text to be both structured and meaningless. To put it another way, it is not much use to test whether the Voynich more closely resembles natural language or randomly scrambled language, because we already know that the Voynich is neither natural language nor pure randomness. What we need is to determine what kinds of structure are unique to meaningful texts and cannot easily be reproduced in human-produced gibberish.

Looking at the MATTR profiles of longer gibberish documents might be helpful here; I'll have to think more about this.
I'm afraid we won't be able to tell if MATTR can be used to profile spontaneously generated nonsense text before we actually have a number of such texts. So that is really the main hurdle.
@Koen: Didn't you have collected meaningless texts here in the forum some time ago ? But unfortunately I can no longer remember the corresponding thread.
(15-01-2023, 09:57 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.@Koen: Didn't you have collected meaningless texts here in the forum some time ago ? But unfortunately I can no longer remember the corresponding thread.

Hmm, Marco worked on this, and I submitted a meaningless text  I made to him, but it was relatively short. Moreover, I think people like us are so well acquainted with Voynich statistics our knowledge might affect the result somehow. For example if I were to take the task upon myself to write a long meaningless text, I would do this while knowing the tests that will be performed on it. Someone with no or only a passing understanding of Voynichese and its weird behavior would not have this problem.
(15-01-2023, 09:57 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.@Koen: Didn't you have collected meaningless texts here in the forum some time ago ? But unfortunately I can no longer remember the corresponding thread.


You are not allowed to view links. Register or Login to view.

This was the thread, but as Koen says these experiments did not go far. Thankfully, we now have Gaskell and Bowern's paper as a more solid base for discussion.



(15-01-2023, 07:45 PM)degaskell Wrote: You are not allowed to view links. Register or Login to view.The two general approaches outlined above (MATTR and vocabulary-matching) seem promising, but I think they both suffer from the problem that they are fundamentally measures of structure rather than measures of meaning. It's easy enough to take structure as a reasonably proxy for meaning, but as our experiments show, it is quite possible for a text to be both structured and meaningless.

Personally, I don't see structure as a proxy for meaning, but as something that can be discussed on an objective ground and can be more easily agreed upon. If one looks at ancient texts, much of their content is meaningless gibberish, for instance the Alchemical Herbal is a collection of unidentifiable plants with unheard names and unbelievable properties. Is this meaningful?

Is the Books of Revelations meaningful?

Quote:And I beheld, and, lo, in the midst of the throne and of the four beasts, and in the midst of the elders, stood a Lamb as it had been slain, having seven horns and seven eyes, which are the seven Spirits of God sent forth into all the earth. And he came and took the book out of the right hand of him that sat upon the throne. And when he had taken the book, the four beasts and four and twenty elders fell down before the Lamb, having every one of them harps, and golden vials full of odours, which are the prayers of saints.

I think that focussing on structure and grammar is easier. We can all agree that the Alchemical Herbal is written in grammatical Latin (or Italian) and that King James Bible is written in grammatical English. I would like to know if the Voynich manuscript follows a grammar: it would be great if that was the grammar of a natural language, but identifying any grammar would be a huge step forward. I think this step must be taken before we can discuss meaning.
Pages: 1 2 3