The Voynich Ninja

Full Version: Review on "The Linguistics of the Voynich Manuscript"
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
There is an unpublished paper: You are not allowed to view links. Register or Login to view.
by Torsten Timm and Dr. Andreas Schinner

In this paper we address severe deficiencies of a recent publication by Claire L. Bowern and Luke Lindemann on You are not allowed to view links. Register or Login to view. [Annual Review of Linguistics  2021 (7): 285-308].

Quote:During the last years, research on the Voynich Manuscript, a (most likely) medieval codex written in unique "cipher" script, has drifted into a problematic direction. Even articles in peer-reviewed academic journals sometimes neglect the basic principles of scientific methodology. A recent publication appears symptomatic for this questionable development.
Funny, I was just rereading Bowern and Lindemann today. I'm not convinced that the VMs's text is (and always has been) meaningless. But I agree with you B&L don't present nearly strong enough evidence to justify how firmly they reject the possibility of it being meaningless. I think I would have been a bit more parsimonious, and put it something like this: "We deem it reasonable to continue to entertain the possibility that the [VMs's text] contains retrievable meaningful content."

This is really all they must establish in order to justify their endeavor. They've given the meaninglessness hypothesis its due respect, if they merely show that it does not render their study completely pointless. But they were more outspokenly dismissive of the meaninglessness hypothesis than was warranted, which makes me suspicious of bias.

Your point about formatting and markup making the Wikipedia text specimens an apples-to-oranges comparison to VMs text specimens is pretty important. Marco took the authors to task on this methodological error in the thread about their paper, and the authors conceded that this is an area for improvement. As you lament in your paper, peer review systems are fallible. In my limited experience with the process of peer review (medical journals, medical student and resident research fairs, and the electrical engineering industry through my wife's family business), peer review is only as good as the least impartial peer reviewing the study. Fortunately, from what I've observed recently, voynich.ninja still provides a coarse and informal but fairly rigorous vetting for anyone who wishes to put their idea to the test.

Like you, I tire of semantic sideshows about the conflation of "random", "hoax", or "gibberish" with "not linguistically meaningful". The first is patently false, and the second two are value judgements, not statements of fact. I'm reminded of lawyers purposely wasting a court's time by splitting hairs over others' wording. Again, what are they avoiding dealing with? I suspect the elephant in the room is that the hypothesis that the VMs's text never contained linguistic meaning is not so easily waved away.

I'd like to have a look at that study L&B cite (apparently their own?), in which producing "more than 100 words" of meaningless language-like text proved exceedingly difficult. Not that I doubt them, but let's see the details of their methodology. If they really wanted to reject your autocopy hypothesis, they should have set it up and run it exactly as you describe, using only tools available to a medieval scribe.

Here's the bottom line, as far as I'm concerned: What statistical tests can reliably distinguish long strings of symbols arranged to communicate a specific linguistic meaning (but whose meaning is not apparent to the experimenter), from long strings of symbols arranged without any such intention? Until someone much more versed in computational linguistics can suggest some tests with demonstrably good sensitivity and specificity for "language-ish gibberish" and/or "undeciphered message", then as far as I'm concerned the whole argument over either real language or fake language being ruled out is mostly posturing.
Let's please stay on topic, that is discussion of the paper by Timm & Schinner
(11-02-2021, 12:54 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.The idea that it is meaningless text raises a fundamental question: would anyone go to such lengths only to end up writing nothing meaningful ? I think this is rather unlikely.
Personally, I've never found this line of argument appealing. If you look at the immense ingenuity displayed in fakes and hoaxes over the centuries, the amount of effort devoted to them is the least surprising thing. On other hand, the stumbling block to these theories is how it was done. But then again, how it was done is the stumbling block to every theory about the MS.

To stay on topic, the most disappointing aspect of the paper that T&S are responding is the oversight of their work, because it is relatively more plausible than the other hoax alteratives and a good paper ought to dispose of its strongest alternatives, not its weakest. That said, I'm not sure that linguists would have any particular insight into the potential defects of the T&S proposal.
(15-02-2021, 12:31 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.To stay on topic, the most disappointing aspect of the paper that T&S are responding is the oversight of their work, because it is relatively more plausible than the other hoax alteratives and a good paper ought to dispose of its strongest alternatives, not its weakest. That said, I'm not sure that linguists would have any particular insight into the potential defects of the T&S proposal.

I must emphasize that we do not criticize the fact that Bowern and Lindemann obviously see our work as completely irrelevant. In our eyes this doesn't make any difference. But we think that it is important to cover also counter arguments. Also Bowern and Lindemann share this view since they criticize other authors for "omitting any information that does not fit the theory they propose." Therefore we criticize places where the authors only state what they believe and places where important counter arguments are overseen.  

Since it was necessary to prioritize the flaws I would like to share some additional points here:

1) "There are some exceptions, in particular with the word 'daiin', which is common in every position except paragraph-initially"
Since paragraph-initially words usually start with a gallow glyph it is no surprise if words not starting with a gallow glpyh are uncommon there.

2) On p.4 the paper rules out a polyalphabetical cipher since such a cipher "would lead to identical words being encoded differently in different parts of the manuscripts".
However if it comes to a statistical outlier the paper argues in the opposite direction: "It is possible that chedy and shedy represent the same word as they are distinguished only by whether there is a plume stroke over the bench character. If we make this assumption Voynich B is less of an outlier." Later it is even argued "A comprehensive linguistic analysis needs to take seriously the possibility that, for example, paiin, saiin, aiin, and am are all positional variants of the same word." But if 'chedy'/'shedy' or 'paiin'/'saiin'/'aiin' represent the same word what does this mean for word pairs like 'cheedy'/'sheedy', 'chdy'/'shdy', 'taiin'/'daiin' or 'chaiin', 'shaiin'?

3) The paper refers to Tiltman (1967), Stolfi (2000) and Reddy & Knight (2011) for the hypothesis that "Voynich words consist of three separate 'fields,' with particular symbols occurring at the beginning, middle, or end of the word".
* However Tiltman did in fact "divided words into ... 'roots' and 'suffixes'" (Tiltman 1967, p. 7). 
* Stolfi on the other hand tried to parse Voynich words into prefix, midfix and suffix (see Stolfi 1997, You are not allowed to view links. Register or Login to view.)
But the outcome of this experiment was "that there is a surprisingly small number of prefixes and suffixes with significative frequency." Later Stolfi published a model using three nested layers for parsing Voynich word tokens. (see Stolfi 2000, You are not allowed to view links. Register or Login to view.)
* Reddy and Knight used an MDL-based algorithm to segment "words into prefix+stem+suffix, and extracts ‘signatures’, sets of affixes that attach to the same set of stems."
This way Reddy and Knight illustrate that some glyph sequences like 'ol+', 'ot+' and '+dy', '+y' occur predominantly at the beginning or end of a word.
Reddy and Knight concluded "that stems in the same signature tend to have some structural similarities." (Reddy & Knight 2011, p. 82).
* Moreover beside the common word 'aiin' also words like 'daiin', 'kaiin', 'taiin' exists. For some unknown reason 'd' is listed as prefix but 'k' and 't' are listed as midfixes. In the same way the common glyph groups 'ol'/'or'/'al'/'ar' are split into 'o'/'a' as midfixes and 'l'/'r' as suffixes.

4) On p. 18 the paper concludes "All of these observations lead to generalizations which appear to be typographical rather than linguistic in nature."
But later the paper argues that "the word and line level metrics show it to be regular natural language."

5) "Syntax describes the ways in which words fit together in a hierarchical structure, and generalizations about word and phrase combinations can explicate this structure. Syntax has been studied less systematically than character- and word-level patterns in the Voynich Manuscript."
The lack of repetitive phrases is in fact the most often used argument against the natural language hypotheses. Even some of the papers cited do in fact cover the observation that phrases are missing.
* Tiltmann wrote in 1976 "My analysis, I believe, shows that the text cannot be the result of substituting single symbols for letters in the natural order. Languages simply do not behave in this way. ...  And yet I am not aware of any long repetitions of more than 2 or 3 words in succession, as might be expected for instance in the text under the botanical drawings" (Tiltmann 1976).
* D'Imperio wrote in 1978: "Also the strange lack of parallel context surrounding different occurrences off the 'same' word as shown by word indexes. In the words of several researchers ' the text just doesn't act like natural language'" (D'Imperio 1976, p. 30).

6) The paper argues that the Voynich manuscript is encoded using a cipher that preserves language like structure "and in addition create predictability in the writing system."
Ciphers normally use randomness to hide language structures. Therefore simple ciphers do not change the predictability whereas more advanced ciphers do decrease the predictability.
This means a cipher can either preserve the text structure (for instance by using a dictionary cipher) or it can add predictability (for instance by omitting information or by encoding each plain text letter with a whole cipher word). In the first case it would be possible to detect typical phrases and in the second case the hypothesis that a Voynich word does represent a word in a natural language must be wrong.

Also some basic facts are wrong:
7) The paper  does already start with a mistake: "The Voynich Manuscript has 116 folios (i.e., 232 pages)".
Since some folios are missing there are less than 116 folios and since the Voynich manuscript has foldouts there are not twice as many pages as folios. Actually the Voynich manuscript contains only 102 folios and "including blank pages and pages without text there are 240 pages. 225 pages include text" (Reddy & Knight 2011, p. 78).

8) "The sequence qo is represented as qo in large part because q never appears in the text except before o" (p. 291).
There are numerous counter examples. There are for instance 66 instances of 'qe', 23 instances of 'qc', and 7 instances of 'qa'.

9) "even though 'h' is never found separately'. (p. 291).
There are 85 instances of 'hh' like in 'chcphhy' on folio 7v and 'chckhhy' on folio 15r. There is 'tohedy' on folio 39r, 'okehdar' on folio f67r1, and 'olohy' on folio f83v. And there is also 'qoiheey' on folio 73r and 'chcthihy' on folio 85r.

10) "For example, p/f p/f are never followed by e" (p. 302).
At least <shefeeedy> on folio 48v and <qopeeedar> on folio 50r exists.

11) The paper refers multiple times to Sterneck & Bowern (2020). But such a paper was not published until today.
Torsten Wrote: "There are some exceptions, in particular with the word 'daiin', which is common in every position except paragraph-initially"

Since paragraph-initially words usually start with a gallow glyph it is no surprise if words not starting with a gallow glpyh are uncommon there.



Agreed. It does occur following gallows. Which means that if gallows in initial position is positioned as though it were pilcrow/capitulum (which I am moderately sure it is), then aiin occurs in initial position for the main body of text.

Examples: You are not allowed to view links. Register or Login to view.    f35r (2nd paragraph) and others   Also possibly ones like f33r (line 4).
(19-02-2021, 02:55 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.8) "The sequence qo is represented as qo in large part because q never appears in the text except before o" (p. 291).

There are numerous counter examples. There are for instance 66 instances of 'qe', 23 instances of 'qc', and 7 instances of 'qa'.

I agree with this also. There are numerous counter-examples. Here are some examples (I just grabbed a screenful; I did not include all of them). There are a number of cases were 4'o occurs (there is a macron between 4 and o). There are some where the 4 is constructed with a long stem. There are quite a few where 4 is followed by a variety of other glyphs. And there are some that I think of as a "soft-4" where the loop is so rounded it almost looks like y except that it has a straight descender.

[Image: Examplesof4o.png]

In the past, I have suggested that one possible explanation for 4o being common is that 4 is frequently placed before o-tokens. This is different from 4o being a pair (it might be a ligature but that is different from a linguistic or symbolic pair). Or... there may be both.
In this thread mainly statements about other subjects were discussed. However a paper should be judged because of the facts and the line of argumentation provided.

The central point in the paper of Bowern and Lindemann is their conclusion: "The higher structure of the manuscript itself is completely consistent with natural language and is very unlikely to be manufactured." In detail the authors argue that "For measures that look above the word to line and paragraph, as well as in the distribution of words across the manuscript, it looks like a natural language." 
However, it is well known that the Voynich manuscript provides language like features as well as non-language like features. Language like feature are for instance that the distribution of words follows both of Zipf's Laws and that the TF-IDF values demonstrate that words depend on the page (see Reddy & Knight 2011). It seems as if pages "do have topics, but are not independent of one another" (Reddy & Knight 2011). 
A non-language like feature is for instance the observation that the text only contains a few repeated word bigrams and trigrams (this means repeated phrases are missing). Several researchers did point to this fact as evidence against the natural language hypotheses (see Tiltmann 1976, D'Imperio 1978, Reddy & Knight 2011, Ito 2002 and Timm 2015). Other counter-arguments are the shift from Currier A to Currier B,  the fact that some herbal pages are written in Currier A whereas others are written in Currier B and also the fact that words did co-occur together with similar ones. For instance words ending in '-edy' are typical for Currier B but almost non-existent in Currier A and words starting with 'cho-' are frequently used in Currier A and often occur repeated, but occur less frequently in Currier B (see Currier 1976). 
To draw a conclusion it is therefore necessary to weight in the top language like features as well as the top non-language like features.

Bowern and Lindemann are also arguing that the text is unlikely to be manufactured since "gibberish is by nature random". It seems as if Bowern and Lindemann would expect that they could easily identify gibberish since someone generating some pseudo text would avoid language like structures. Such an argumentation is not comprehensible. 
Not only is no one seriously claiming that the text is pure noise, is is even impossible to generate longer random sequences manually: "It is a dramatic observation that when human subjects asked to generate random sequences, they normally cannot produce sequences that satisfy accepted criteria for randomness" (Treismann and Faulkner 1987, p. 338 You are not allowed to view links. Register or Login to view.). (see also Tune 1964, You are not allowed to view links. Register or Login to view. and Wagennar 1972, You are not allowed to view links. Register or Login to view.). There are even experiments demonstrating that humans usually fail to discriminate between random and nonrandom sequences, the bias being in direction of negative recency (see Wagennar 1970, You are not allowed to view links. Register or Login to view.).
(15-03-2021, 08:53 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Bowern and Lindemann are also arguing that the text is unlikely to be manufactured since "gibberish is by nature random". It seems as if Bowern and Lindemann would expect that they could easily identify gibberish since someone generating some pseudo text would avoid language like structures. Such an argumentation is not comprehensible. 


Hi Torsten: I agree with everything in your post, and have often... even on this thread... pointed out several of your points. Of course you are familiar with Hélène Smith, but:

You are not allowed to view links. Register or Login to view.

"Flournoy concluded that her "Martian" language had a strong resemblance to Ms. Smith's native language of French and that her automatic writing was "romances of the subliminal imagination, derived largely from forgotten sources (for example, books read as a child)." He invented the term cryptomnesia to describe this phenomenon."

I have and have read Flournoy's book, and this is a recurring observation of his. This would be "automatic writing": You are not allowed to view links. Register or Login to view.

I think it is reasonable to consider it would. And if so, then there is really little way to know, from any of these studies, whether the Voynich has real meaning or not. What do you think?

I had come across many instances of the same thing... I think in fact in every case I found, there was underlying structure of language. What I would love to see is one of these randomly generated written strings being tested by the same methods as Voynichese, to see if they show similar comparisons to the meaningful language samples that are always used.

This test might settle this issue one way or the other.
Can some moderator please intervene and move posts around. There haven't been any on-topic posts for quite a while now.
Pages: 1 2