10-01-2023, 10:45 PM
As seen at the conference last year, Claire Bowern and I have recently published a paper examining the statistical properties of meaningless text. Interested parties are referred to the full paper (now available You are not allowed to view links. Register or Login to view., but briefly, we recruited human participants to produce real, handwritten samples of meaningless text and compared them statistically to Voynichese. Contrary to what has often been assumed, we found that real human gibberish actually tends to be highly non-random and may even explain some of the more unusual features of Voynichese (such as low entropy) better than meaningful text does.
I'll take the cautious scientific approach here and not try to over-analyze what this actually means, but I do want to start a conversation about how to more rigorously test whether the Voynich is meaningful or not. As we argue in the paper, many existing approaches have implicitly operated from the assumption that "meaningless" = "random", so if we find non-random patterns in the text (of word and character frequencies, word placement in sections, etc.), these are often taken as evidence that the text encodes meaningful content. However, our experiments generally contradict this assumption. When we actually sit real humans down and say "write me something that looks meaningful but isn't" - even people without much background in linguistics or the Voynich manuscript - we end up with an explosion of different texts and approaches, many of which are surprisingly non-random. On the whole, this gives me great caution in assuming almost anything about what a group of hoaxing scribes might have been capable or incapable of doing. To borrow a line from a colleague of mine, "I don't know, man, people are weird."
But again, if this is true, how might we more rigorously test if the text is meaningful or not? I think one major outstanding gap is in our understanding of how small-scale characteristics of gibberish might propagate over larger-scale documents like the Voynich, but there are undoubtedly others as well. We suggest in the paper that computer simulations might be one way to approach this, but I'm very interested to hear other ideas.
P.S. Torsten Timm may be interested to note that our experiment broadly seems to support his idea of "self-citation", at least in the sense that some of our participants did actually report doing this.
I'll take the cautious scientific approach here and not try to over-analyze what this actually means, but I do want to start a conversation about how to more rigorously test whether the Voynich is meaningful or not. As we argue in the paper, many existing approaches have implicitly operated from the assumption that "meaningless" = "random", so if we find non-random patterns in the text (of word and character frequencies, word placement in sections, etc.), these are often taken as evidence that the text encodes meaningful content. However, our experiments generally contradict this assumption. When we actually sit real humans down and say "write me something that looks meaningful but isn't" - even people without much background in linguistics or the Voynich manuscript - we end up with an explosion of different texts and approaches, many of which are surprisingly non-random. On the whole, this gives me great caution in assuming almost anything about what a group of hoaxing scribes might have been capable or incapable of doing. To borrow a line from a colleague of mine, "I don't know, man, people are weird."
But again, if this is true, how might we more rigorously test if the text is meaningful or not? I think one major outstanding gap is in our understanding of how small-scale characteristics of gibberish might propagate over larger-scale documents like the Voynich, but there are undoubtedly others as well. We suggest in the paper that computer simulations might be one way to approach this, but I'm very interested to hear other ideas.
P.S. Torsten Timm may be interested to note that our experiment broadly seems to support his idea of "self-citation", at least in the sense that some of our participants did actually report doing this.