The Voynich Ninja

Full Version: Decoding Anagrammed Texts Written in an Unknown Language and Script
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8
Usually the first thing I want to look at is the code (and unfortunately, I don't have time to look at their code, at least not any time soon, but I was happy to see it's available for perusal).

Many coders claim "AI technology" when in fact it's not. If it learns and self-modifies, then we're getting into AI territory. If it's a set of rules and a lot of brute-force lookups and processing, not so much.
(30-01-2018, 05:32 AM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Usually the first thing I want to look at is the code (and unfortunately, I don't have time to look at their code, at least not any time soon, but I was happy to see it's available for perusal).

Many coders claim "AI technology" when in fact it's not. If it learns and self-modifies, then we're getting into AI territory. If it's a set of rules and a lot of brute-force lookups and processing, not so much.

The algorithm is described in their paper and it isn't AI in any sense of the word, nor is it claimed to be in the paper.   You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. work on natural language processing/computational linguistics.   There's an overlap with AI,  but not all NLP is AI.  (Source: used to work on, and have given lecture courses, in AI.   Also worked on search engines, which are NLP but usually aren't AI.)

AI doesn't necessarily involve learning, though if a system learns, it's usually AI.
I've done AI programming, with languages like LISP, Prolog, and others (not so much Python, but that is simply due to lack of time), and didn't actually expect that the researchers had accomplished AI algorithms with perl (which is a very flexible language and very good for text processing but not inherently adapted to AI applications unless combined with other software).

I had to glance through the paper again because the media keeps saying AI and after a while the hype and actual research blurr together and, indeed, the researchers make no claim that this is artificial intelligence.


Unfortunately, the press is loading the headlines (and articles) with emotional buzzwords ("solved", "what no human could accomplish", "AI") and making no effort to represent the work accurately.
I still don't understand why some people are mildly positive about this paper. I'm not a coding expert, but I do know that abjad anagramming is a one way cipher. And that plonking your results into google translate is an absolutely embarassing thing to write about in a scientific paper. Am I missing something?
(31-01-2018, 11:19 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.I still don't understand why some people are mildly positive about this paper. I'm not a coding expert, but I do know that abjad anagramming is a one way cipher. And that plonking your results into google translate is an absolutely embarassing thing to write about in a scientific paper. Am I missing something?


I have been developing software since the early 1980s, have worked on various things including Good Old Fashioned AI (i.e. small data, programming in Lisp and Prolog), speech processing, search engines, and data mining including collaborative filtering.   I have also carried out a You are not allowed to view links. Register or Login to view., reception of which has been mixed: almost entirely negative here (which has discouraged me from pursuing my ideas further), but more positively from a few others including academics.   We all have our own, personal, Overton windows, and limited time and knowledge.   I'm aware of this and I act accordingly.    Sometimes a theory is irredeemable, but often all that's needed are some minor alterations.

There's a good paper hidden inside the published paper if you read it carefully.   They have shown that meaningful text in a known (i.e. in the corpus of candidate languages) but unidentified language, which has been encrypted using any combination of three steps: vowel removal, anagramming, and simple substitution encipherment, can be recovered using their method.   This is an interesting and perhaps surprising result.

The problems start when they apply it to the Voynich Manuscript.   If the Voynich Manuscript is meaningless, their method will still find a spurious closest match.   Even if it's meaningful and encrypted using only those three steps, using a modern language dataset (the You are not allowed to view links. Register or Login to view.) will also result in a spurious match.   What they should have done is use a dataset of substantial texts in languages known in the old world in the early 15th Century, and in addition to the Voynich Manuscript (but not all of it -- see later), apply their method to texts known to be meaningless (e.g. You are not allowed to view links. Register or Login to view.) to determine a baseline for acceptance.   A match between the Voynich Manuscript and a candidate language can only be taken seriously if it scores higher than any of the matches between the bogus texts and their candidate languages.

If they did all that, and identified a plausible language, they should then get someone (a real person who knows the language) to attempt to translate several pages randomly selected from the parts of the manuscript they didn't use to identify the language.   If they can't produce a meaningful translation, they should draw the conclusion (itself extremely useful) that the Voynich Manuscript is either meaningless text, or is not in one of the candidate languages, or is encrypted using a different method.   And that would be a very useful result.

I have nothing positive to say about the reporting of their paper in the popular press.
(31-01-2018, 11:19 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.I still don't understand why some people are mildly positive about this paper. I'm not a coding expert, but I do know that abjad anagramming is a one way cipher. And that plonking your results into google translate is an absolutely embarassing thing to write about in a scientific paper. Am I missing something?

Hi Koen,
I am not an expert either and I possibly only understood 50% of the paper. Still, I am positive about it. Here are some totally subjective reasons:

* VMS hobbyists tend to almost exclusively focus on manuscript images. This forum is no exception: in the last month there were 13 active threads in "Imagery" and only 2 in "Analysis of the text". As you know, I discovered the ms through the work of Prof.You are not allowed to view links. Register or Login to view., so I have always been interested in the language as well as the illustrations. I am positive about all contributions that seriously examine the Voynichese language.

* Real experts seldom contribute to Voynich research. It's great that someone like Prof.Kondrak took an interest in the ms. I hope to read more from him.

* My impression is that a lot of hard work has been put into this research. Sometimes you also read a blog post in which a fair amount of work has been put, but this is rare and typically the work-hours/words ratio is much lower than in a scientific paper like this.

* The histogram in fig.4 is the kind of things I find interesting. The authors compared Voynichese with 380 different languages and applied a quantitative distance measure (resulting in Hebrew clearly being the best candidate). There is something interesting going on here. Of course, without perfectly understanding the details (which I don't), it's impossible to put this information to good use. Crap press simply concludes that "the ms is written in Hebrew", but this is quite uncertain and very different from what the authors say.

* I learned from this paper that "Knight et al. (2011) describe a successful decipherment of an eighteenth century text known as the Copiale Cipher". I could have learned about this elsewhere, but I haven't seen Knight's decipherment of the Copiale Cipher much discussed in the Voynich amateurish community. Making the good work of other researchers known is a major contribution. I am grateful to Hauer and Kondrak for this reference. [Knight's paper is You are not allowed to view links. Register or Login to view., I haven't read it yet]

Is the Hauer and Kondrak's paper perfect? Of course not. Rene is  right in pointing out that one cannot rely on a single VMS transcription (even if Currier seems to me a relatively good choice) and of course "google translate" has nothing to add to the paper. But in my eyes these limits are amply compensated by all the experiments with new approaches discussed.

Possibly others look down at this paper because they have done better than this: they write great Natural Language Processing software, they have read all the literature and published better works in more respected publications. But where is this stuff?
(31-01-2018, 11:19 AM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.I'm not a coding expert, but I do know that abjad anagramming is a one way cipher.

That depends on whether you are anagramming according to a deterministic reversible algorithm or not.
I know it seems like a bad idea for the putative 15th C Voynich author to apply a one-way (irreversible) cipher, since it would make the result meaningless. However, since what we have is effectively meaningless after countless years of analysis, it is hard to argue that the text cannot be the result of an irreversible cipher.

I agree with Donald Fisk about the paper. However, there are a few rather important mistakes. I am writing a commentary that I will put up in the coming days.

Still, the strange and very close correspondence in Figure 4 is hard to explain. It is one of these statistics that really 'stand out' (just like figure 5, for that matter), and it deserves to be analysed further, as it may give an important clue.
Rene: the cipher might be one way, in which case perhaps a real text was used as the seed to generate something language-like without meaning. If that were the case, we'd never be able to prove or disprove it though (see Torsten discussions). What I mean is though, that Voynichese being the result of a one-way cipher is not compatible with linguistic solutions like we see proposed here.

Anton: of course if anagramming is done according to certain rules, it can be reverted. Is this what the authors argue? Either way, if we'd use anagramming to explain Voynichese, there would likely be restrictions in place on letter order in the ciphertext - as pointed out before, similar to alphabetic anagramming. Wouldn't this increase the likelihood of information loss as well? And that's not even touching on the combination with abjads.

Marco and Donald: good points, it's not because some things about it are dubious that the whole thing is worthless. I will have a look again at the points you mention.
Pages: 1 2 3 4 5 6 7 8