The Voynich Ninja

Full Version: The Linguistics of the Voynich Manuscript (Bowern et al. 2020)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
As far as I recall, when Lisa mentioned Bowern's work to me, this would be about looking for text behaviour that matches different scribal hands. This sounded very interesting, but I don't know if it was abandoned or still in progress.
Thanks to everybody for the interesting comments!
I agree with Emma about the importance of Zipf's law and entropy for Bowern and Lindeman's arguments. The fact that these linguists make extensive reference to quantitative measures is one of the strengths of the paper.  As I said, I hope they will share their corpus so that people can replicate their experiments.

Some more ideas about Zipf's law:
One year ago, I looked into You are not allowed to view links. Register or Login to view.: Liber Loagaeth and the Enochian Calls. One of these two languages is also mentioned by Bowern and Lindeman, I guess they are referring to the Enochian Calls, which are more clearly a pseudo-language, coming with an English translation. Both languages were recorded in Dee's diaries in the 1580s and are presented as communications from angels with Kelley acting as a medium.
According to my fitting measure NRMSD (which admittedly may not be very reliable) You are not allowed to view links. Register or Login to view..
Some argue that these must be complex ciphers that nobody has been able to detect as such, but I have never seen any evidence to support this speculation. My opinion is that, until something solid is put forward, we must take what the authors wrote at face value: these texts document the esoteric experiments carried out by the two men. If you are interested in the subject, please read Laycock's essay "Angelic language or mortal folly?”.


A few months later, You are not allowed to view links. Register or Login to view. asking people to contribute hand-made meaningless text. I never shared the results because there was little interest and the only two who actually contributed samples were Koen and myself. Both samples are about 1000 words long.
While my attempt turned out to mostly be hapax-legomena, Koen wrote some gibberish that nicely fits Zipf's law. Since Koen's text is certainly meaningless, I regard it as empirical proof that Zipf's law is not necessarily the result of language or an algorithm. At the time, I also checked MATTR200 and Koen's text is comparable with language (while mine produces a measure close to the maximum value 1).

I attach Zipf log plots for:
1: Liber Loagaeth and the Enochian Calls
2: VMS Currier A and Quire 20 (a subset of Currier B)
3: Koen's asemic text and mine
4: Virigil's Aeneid (Latin) and King James' Genesis (English)

You are not allowed to view links. Register or Login to view.


(08-09-2020, 06:35 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.As far as I recall, when Lisa mentioned Bowern's work to me, this would be about looking for text behaviour that matches different scribal hands. This sounded very interesting, but I don't know if it was abandoned or still in progress.

Hi Koen,
there's a reference to what I guess is another forthcoming paper (not still available, it seems):
Quote:Sterneck, Rachel and Claire Bowern. 2020. Topic modeling in the Voynich manuscript. URL
arxiv.org.
I did contribute a sample as well, but I found it extraordinarily difficult to generate meaningless text. It was only about 300 words.
(08-09-2020, 07:16 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.One year ago, I looked into You are not allowed to view links. Register or Login to view.: Liber Loagaeth and the Enochian Calls. One of these two languages is also mentioned by Bowern and Lindeman, I guess they are referring to the Enochian Calls, which are more clearly a pseudo-language, coming with an English translation.

Marco, I suspect that they address this text because they also refer to the Daruka paper of 2020, where this text is analysed.
Do you have a copy?
(Note: I cannot recommend this paper)
Quote:we assume that there are two “languages” in the manuscript, referred to as Voynich A and Voynich B. More precisely, there are two methods of encoding at least one natural language.

The latter statement is a weak point. It should be shown how "two different methods of encoding" prefers before "two different languages encoded by the same method" or "same language encoded by the same method, but the subject and hence vocabulary is different".
Thanks for this! I'll try to answer the main questions (quick note that I'm relatively new to Voynich.ninja and haven't extensively browsed prior discussions).

(07-09-2020, 07:52 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.Now that I've a had a chance to read the paper, I'd say that it is clear and generally well-written but it is best viewed as setting forth the status quaestionis of the VM rather than advancing a key new insight.


Yes - that was its purpose. Luke and I were asked to write an overview of the current state of work on Voynich linguistics, which included a historical overview, some citation and discussion of others' work, and some of our own recent work. We had a strict word limit and so unfortunately were unable to cite everything we would have liked to.

(07-09-2020, 07:52 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.The paper does not consider Schinner and Timm's work, which in my view is a major oversight since their auto-citation proposal, if I understand it correctly, can generate some of the supposed linguistic features discussed in the paper (like the Zipfian behaviors).

I'm less convinced that Zipfian distributions are crucial evidence one way or the other. Many things follow power law distributions.


(07-09-2020, 07:52 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.They are also probably too quick to a late medieval/early modern fake here:
[quote="Bowern et al. 2020:5"]We also consider it unlikely that the The Voynich Manuscript is an ancient hoax. The cost associated with the production of such a manuscript and the number of people involved make it unlikely that it was created purely to deceive. A much smaller hoax would have served the same purpose with much less expense. Moreover, people who assume that the manuscript is a medieval gibberish hoax massively underestimate the amount of effort required to produce sustained language-like nonsense.5

We wrote that before the information about Scottish wikipedia came up (not exactly a hoax but not real) (You are not allowed to view links. Register or Login to view.). I'm not fully convinced though, when something of less scale would be as effective and cheaper.


(07-09-2020, 07:52 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.I'd like to know more about this undergraduate experiment--as well as others in asemic writing--especially on the nature of local repetitions (how are they comparable to the VM's repetitiousness?). Again, here, some engagement with the Schinner & Timm proposal may be enlightening. Can undergraduates, briefly trained in auto-citation, simulate features of the text?
Next time I teach the class (next year most likely) we'll try it out.
(08-09-2020, 07:16 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Thanks to everybody for the interesting comments!
I agree with Emma about the importance of Zipf's law and entropy for Bowern and Lindeman's arguments. The fact that these linguists make extensive reference to quantitative measures is one of the strengths of the paper.  As I said, I hope they will share their corpus so that people can replicate their experiments.

We will definitely share all code and corpus materials, in the next month or so - we are currently finalizing both the corpus paper and two other papers (one on TF/IDF and one on script inference). Unfortunately covid-related closures and online teaching has really slowed this work down.

(08-09-2020, 07:16 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.
(08-09-2020, 06:35 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.As far as I recall, when Lisa mentioned Bowern's work to me, this would be about looking for text behaviour that matches different scribal hands. This sounded very interesting, but I don't know if it was abandoned or still in progress.

Hi Koen,
there's a reference to what I guess is another forthcoming paper (not still available, it seems):
Quote:Sterneck, Rachel and Claire Bowern. 2020. Topic modeling in the Voynich manuscript. URL
arxiv.org.

We need to do some edits to the paper (mostly to explain in more detail what we did). It should be up soon. We were looking at whether we could disentangle linguistic features that might be represented in different scribal hands versus differences in subject matter (which was a comment above too). I think the answer in brief is that TF/IDF keyword analyses like we did can recover differences between scribes when we also take subject matter (as inferred by the illustrations) into account.
(07-09-2020, 10:19 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.So, having read to the end, I should slightly revise my impressions. It's not clear until the summary that they favour a "cipher" solution rather than a simple "language" solution. I think this is fair, though it seems that the conclusion rests mostly on the entropy problem.

Yes, that's right. In a world where I have to choose "cipher" vs "language", right now I'd choose cipher. there are arguments in favor of both, so consider it a preference rather than dogma
(07-09-2020, 08:45 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.As I am not a linguist, I did not understand what is the share of the authors' analyzes? Do the graphs reflect their own results? On what documents and from what period do they base their analyzes? Is it marked somewhere?

It's mostly a recap (that's what the Annual Review publishes - 'state of the art' type articles). But Luke and I are also working on the manuscript and included a few recent findings where we thought they would be of interest.
Quote:The question we should really be asking is, "How likely is it that someone in the early 15th Century would use a method which resulted in text which followed Zipf's Law?"   The constraint they would have been working to is that it had to look like a natural language.   The bare minimum for that is that the choice of current glyphs depends on the previous one, and I think that would have been obvious even at the time the Voynich Manuscript was written.

Certainly this is a reasonable way of thinking about it. But it's not so easy to go from "had to look like a natural language" to actually defining that and then implementing it. I accept that Zipf's law is not a marker of language (that is one of the key points about the law) but it is simply one thing they a) had to achieve to make the text "look" like a language, and b) simply weren't aware of. There are many others things they likewise had to manage to achieve.

Without knowing the method by which the text was constructed we can only propose increasingly sophisticated methods which are plausible, but unproven. There is a problem with many Voynich models that they simply model what we know about the text when the models were created and not the text itself. Every new discovery outdates them and will continue to do so. The proposed methods are only as sophisticated as our knowledge, yet we also need to believe that the creators can be as sophisticated as we need them to be.

While we can accept that any method would produce any number of patterns, there's no reason why many or most of those patterns would be consistent with language unless intentionally planned. Of course, not everything in the text is consistent with language, entropy being the most important one. Yet the problem with entropy is that the words are too regularly structured and glyphs too predictable. How did the creators not realise that the individual words were unnatural, even though they could see them, yet manage to create the other patterns they were unaware of? The creators must have possessed an odd mixture of luck and incompetence.
Pages: 1 2 3 4 5