26-05-2019, 11:18 AM
I'm going to stick mostly with the observations in section 2, about word co-occurrence, as that's the meat of this paper. I've said enough about the "self-citation" theory to make it clear that I reject it as being really unsatisfactory. But I think the observations might be useful to other researchers. At the very least they deserve exploring and understanding more.
I'm most interested in this claim:
This is quite a strong claim and yet has the greatest implications. It's certainly nothing I've ever seen before, though is perhaps not surprising given the general word structure of the Voynich text.
If we look at only the ten most common words in the text we can see some such pairs/groups: [daiin, aiin], [chedy, shedy], [chol, ol, or, ar], [dar, ar]. There are many, many more the further down the wordlist we go. This would suggest that, if we allow ourselves the ability to look at the most common ten words (not what they claim) then we would always find such a similar pairing. Figure 3 (page 6) shows just how many pairings some words can have.
Moreover, it would suggest that the similar pairs should tend to be the same words. If the common word pairs on any page tend to be common overall, then the import of the claim is reduced. It is almost like claiming that "a" and "an" (or "an" and "and" for that matter) occur frequently in an English language text.
When we look at You are not allowed to view links. Register or Login to view.this is indeed what we find. Under section 1.3 you can look at the manuscript page by page and the most common words pairs are listed. From a casual browse we can see that in general the same words do occur as pairs time and again. Sometimes we see some word pairs which are uncommon, but they are not the majority. We also see an obvious change from Currier A to Currier B, though some word pairs are still shared.
My worry is that this observation about the occurrence of similar word pairs is a very simple and obvious fact arising from the word structure, rather than anything deeper. It illustrates the way in which the rigid word structure in the Voynich text, coupled with most "structurally valid" words actually occurring, creates lots of similar words in general. It doesn't provide useful support for the "autocopying" theory presented later on in the paper.
I think it's best now to turn to Figure 4 (page 8) as a lot of the argument hinges on this. The graph shows that the edit distance between words is lower the nearer the words occur. Words are more similar by up to 0.3 of an edit distance within 30 (Herbal A) or 60 lines (Quire 20), before reaching a stable edit distance beyond those numbers of lines. A comparison with a text in English (Alice in Wonderland) is made.
I don't feel that the comparison text is really fair, as a coherent story with a single topic is likely to be much more "flat" in terms of word choice. The Voynich text is likely to switch topic on a regular basis, and would have more profitably been compared with other herbals. Indeed, I note that many pages in the Herbal A section have about 10 to 15 lines of text, which could account for much of the difference in edit distance if the similar words are actually different morphological forms of the same word. Likewise, Quire 20 pages have around 30 to 60 lines, meaning that some kind of thematic ordering could influence the similarity of the vocabulary.
I will leave my commentary here, though there's much more to be said, to allow others to join the discussion and perhaps the authors to respond.
I'm most interested in this claim:
Quote:when we look at the three most frequent words on each page, for more than half of the pages two of three will differ in only one detail.
This is quite a strong claim and yet has the greatest implications. It's certainly nothing I've ever seen before, though is perhaps not surprising given the general word structure of the Voynich text.
If we look at only the ten most common words in the text we can see some such pairs/groups: [daiin, aiin], [chedy, shedy], [chol, ol, or, ar], [dar, ar]. There are many, many more the further down the wordlist we go. This would suggest that, if we allow ourselves the ability to look at the most common ten words (not what they claim) then we would always find such a similar pairing. Figure 3 (page 6) shows just how many pairings some words can have.
Moreover, it would suggest that the similar pairs should tend to be the same words. If the common word pairs on any page tend to be common overall, then the import of the claim is reduced. It is almost like claiming that "a" and "an" (or "an" and "and" for that matter) occur frequently in an English language text.
When we look at You are not allowed to view links. Register or Login to view.this is indeed what we find. Under section 1.3 you can look at the manuscript page by page and the most common words pairs are listed. From a casual browse we can see that in general the same words do occur as pairs time and again. Sometimes we see some word pairs which are uncommon, but they are not the majority. We also see an obvious change from Currier A to Currier B, though some word pairs are still shared.
My worry is that this observation about the occurrence of similar word pairs is a very simple and obvious fact arising from the word structure, rather than anything deeper. It illustrates the way in which the rigid word structure in the Voynich text, coupled with most "structurally valid" words actually occurring, creates lots of similar words in general. It doesn't provide useful support for the "autocopying" theory presented later on in the paper.
I think it's best now to turn to Figure 4 (page 8) as a lot of the argument hinges on this. The graph shows that the edit distance between words is lower the nearer the words occur. Words are more similar by up to 0.3 of an edit distance within 30 (Herbal A) or 60 lines (Quire 20), before reaching a stable edit distance beyond those numbers of lines. A comparison with a text in English (Alice in Wonderland) is made.
I don't feel that the comparison text is really fair, as a coherent story with a single topic is likely to be much more "flat" in terms of word choice. The Voynich text is likely to switch topic on a regular basis, and would have more profitably been compared with other herbals. Indeed, I note that many pages in the Herbal A section have about 10 to 15 lines of text, which could account for much of the difference in edit distance if the similar words are actually different morphological forms of the same word. Likewise, Quire 20 pages have around 30 to 60 lines, meaning that some kind of thematic ordering could influence the similarity of the vocabulary.
I will leave my commentary here, though there's much more to be said, to allow others to join the discussion and perhaps the authors to respond.