The Voynich Ninja

Full Version: More rigorously testing the hoax hypothesis
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
As seen at the conference last year, Claire Bowern and I have recently published a paper examining the statistical properties of meaningless text. Interested parties are referred to the full paper (now available You are not allowed to view links. Register or Login to view., but briefly, we recruited human participants to produce real, handwritten samples of meaningless text and compared them statistically to Voynichese. Contrary to what has often been assumed, we found that real human gibberish actually tends to be highly non-random and may even explain some of the more unusual features of Voynichese (such as low entropy) better than meaningful text does.

I'll take the cautious scientific approach here and not try to over-analyze what this actually means, but I do want to start a conversation about how to more rigorously test whether the Voynich is meaningful or not. As we argue in the paper, many existing approaches have implicitly operated from the assumption that "meaningless" = "random", so if we find non-random patterns in the text (of word and character frequencies, word placement in sections, etc.), these are often taken as evidence that the text encodes meaningful content. However, our experiments generally contradict this assumption. When we actually sit real humans down and say "write me something that looks meaningful but isn't" - even people without much background in linguistics or the Voynich manuscript - we end up with an explosion of different texts and approaches, many of which are surprisingly non-random. On the whole, this gives me great caution in assuming almost anything about what a group of hoaxing scribes might have been capable or incapable of doing. To borrow a line from a colleague of mine, "I don't know, man, people are weird."

But again, if this is true, how might we more rigorously test if the text is meaningful or not? I think one major outstanding gap is in our understanding of how small-scale characteristics of gibberish might propagate over larger-scale documents like the Voynich, but there are undoubtedly others as well. We suggest in the paper that computer simulations might be one way to approach this, but I'm very interested to hear other ideas.

P.S. Torsten Timm may be interested to note that our experiment broadly seems to support his idea of "self-citation", at least in the sense that some of our participants did actually report doing this.
What immediately came to my mind when I started reading your post is "we don't know yet what happens when subjects are asked to produce much longer texts over various sessions. How consistent will it remain? Is there drift? How quickly do they go insane and/or resent the assignment? Knowing the implications of scale would be very interesting, but probably hard to test since it's asking a lot of participants. It is something I might try myself (write for 20 minutes a day over a couple of months, no big deal) but I believe that my knowledge of Voynichese statistics would make this exercise worthless.

The fact that results were so variable is interesting, but also points out the difficulty of this line of investigation. How can we ever move beyond "it's not impossible"? How do we differentiate this experiment from the hypothetical infinite monkeys on typewriters who will eventually produce Voynichese but also any other language?
(10-01-2023, 10:45 PM)degaskell Wrote: You are not allowed to view links. Register or Login to view.When we actually sit real humans down and say "write me something that looks meaningful but isn't" - even people without much background in linguistics or the Voynich manuscript - we end up with an explosion of different texts and approaches, many of which are surprisingly non-random.
That many texts are not random is a psychological phenomenon. People are geared to find meaning, even where there is obviously none (or none should exist). So I'm not surprised that many experimentees have to fail in the task of creating completely meaningless text in the end. It would be interesting to compare the "systems" developed "on the fly" with each other to check whether, and if so in which form, there are similarities. For this purpose, one can of course interview the experimentees.
Koen - I think these are exactly the challenges. The variability in particular is a major problem, since even if you did get one person to produce a text long enough for analysis, there is no guarantee that another person would produce a text that looks anything alike it. (The one thing I am relatively comfortable saying is that you're not guaranteed to go insane doing this - the existence of lengthy nonsense texts such as the Codex Seraphinianus seems to disprove that, and IDK, people are weird, man. In interviewing our participants I was struck by how some of them reacted with "this is the most boring thing I have ever done", while others reported enjoying it, like doodling or meditation.)

I suspect the best way out is to think probabilistically. We tried to do this in our experiment by treating text samples as a distribution of values and looking at the degree of overlap between categories; with enough "there is a 32% of seeing this effect, and a 68% chance of seeing this effect," you can start to build up an aggregate picture of which of several options is most likely. But the challenge is how to do this with longer texts, where the sample size is liable to be limited. The computer approach would be to develop some set of generative algorithms thought to be plausible, and then iterate them over a range of plausible parameters values to create a simulated distribution of long texts; but the challenge is how to minimize unknowns enough to say anything persuasive. If robust indicators of meaning or nonmeaning could be identified, of course, that would simplify matters considerably.

(10-01-2023, 11:25 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.People are geared to find meaning, even where there is obviously none (or none should exist). So I'm not surprised that many experimentees have to fail in the task of creating completely meaningless text in the end.
I find the phrase "fail in the task of creating completely meaningless text" fascinating, because from my perspective, the participants didn't fail at all. They wrote text that had no recoverable linguistic meaning, but that did not necessarily mean that it was random, just that any nonrandomness it exhibited did not encode linguistic meaning. Consider the texts "jqvdy fls mcbdkpfkrfbak zzaa", "quodliar sae cauerliar foenibecc", and "the mome raths outgrabe." The first two are equally linguistically meaningless, but differ in randomness; the third is linguistically meaningful, but appears to be nonsense until explained. I think this highlights the distinction between being mathematically random (which the Voynich clearly is not) and being linguistically random (which the Voynich may still be).

(10-01-2023, 11:25 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.It would be interesting to compare the "systems" developed "on the fly" with each other to check whether, and if so in which form, there are similarities. For this purpose, one can of course interview the experimentees.
For background, we collected these data several years ago, and one of my main regrets was not designing in a more comprehensive post-exercise questionnaire for the participants. We did interview them, but analyzing the data raised a lot more questions I wished we had asked!

More anecdotally, however, it seemed to be common to mix and match one or more approaches such as: 1) writing whatever "felt" realistic without thinking very hard about why; 2) informally inventing (or adapting from a known language) a set of morphemes and then using them to construct plausible-sounding words; and 3) copying and modifying previously-invented words to maintain a consistent vocabulary. Were the participants to have invented custom writing systems as well, I expect a fourth category of "what characters looked pretty when juxtaposed" might also have emerged.
In order to reduce the psychological impact on the formation of a meaningless text for the analysis of its statistics, it is necessary to take (assume) 20 people and form a single text from the "words" proposed by the participants in an arbitrary sequence.
Another thing to consider is that the VM scribes may have been working with a writing system that was not natural for them. It may be interesting to induce something similar in participants, for example by restricting the letters they are allowed to use to a defined number of vowels and consonants. This would change the assignment from "come up with anything you want" to "come up with something within these constraints".
(11-01-2023, 10:18 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.This would change the assignment from "come up with anything you want" to "come up with something within these constraints".
Who knows how many constraints the writers of the VM had and how they were implemented... some are very consistent and some seemed to change (variable "settings" or "preferences" creating "dialects" at different scales: paragraph, page, bifolio, and larger). Any gibberish generated without the same evolving constraints will not be statistically comparable.

There is no test for pure gibberish (no meaning) unfortunately.
(10-01-2023, 10:45 PM)degaskell Wrote: You are not allowed to view links. Register or Login to view.But again, if this is true, how might we more rigorously test if the text is meaningful or not? I think one major outstanding gap is in our understanding of how small-scale characteristics of gibberish might propagate over larger-scale documents like the Voynich, but there are undoubtedly others as well. We suggest in the paper that computer simulations might be one way to approach this, but I'm very interested to hear other ideas.
Information makes sense only when a person knows what to do with it, how to apply it. In cases where a person does not understand the meaning of this information, it becomes completely useless to him.
Abraham Maslow
I first had to translate the letter and study it properly.
How I understood it:
So far, one can say that single VM words can be translated into many different languages. Let's just take one language and call it "xyz".
Now I have many words from "xyz" but still they don't make sense when strung together as text.

We had the same thing on page 116. These Latin-like words didn't make sense either. It took a lot of time to form a comprehensible story without getting away from the words. The basis of the dictionaries was classical Latin, but it could only be applied to the dialect with difficulty. Whereas the German text, although a little strange, is clear. (At least for me, since dialect is familiar).
What does it look like when the whole book is written like that, and now encoded?

Moreover, I have to say that the examinations were partly done in EVA. On the other hand, with Stolfi. Here you have to know that EVA is blind and other transcriptions are short-sighted.
Quote:Volunteers were given an instruction sheet telling them to “create a ‘document’ by filling three pages with fake, meaningless text in a ‘language’ that you make up as you go. Ideally, this ‘language’ should not actually mean anything, but should appear realistic enough that most observers would not be able to distinguish it from a real language they simply did not know.

I assume that each of the experimentees tried in his own way to give meaning to the task set. By sense is meant the spontaneous development of a system that transforms the text from random to non-random. That this text can nevertheless be linguistically meaningless has already been shown by @degaskell. In this context, it would be interesting to know whether the texts vary in their randomness across the pages, i.e., perhaps are more random at the beginning and then become less and less random.

As for the system: it would be conceivable for a experimentee to think up meaningful sentences and then rephrase them into meaningless "sentences." In this way, he / she would not run out of ideas and could produce something that sounds like a text. Of course, this is only one possibility.

Unfortunately, it is not known how much time the experimentees had for execution and whether a scratch paper was also allowed to be prescribed.
Pages: 1 2 3