The Voynich Ninja

Full Version: Decoding Anagrammed Texts Written in an Unknown Language and Script
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8
I was intrigued when I saw You are not allowed to view links. Register or Login to view..

The story's all over the Canadian press.   Here's another article: You are not allowed to view links. Register or Login to view..

The paper they published is: You are not allowed to view links. Register or Login to view..

The authors have a general method, which they applied to the Voynich Manuscript, and the closest match was Hebrew with words anagrammed in a particular way.   They decrypted the first line into Hebrew which they then translated into English (using Google Translate) as  “She made recommendations to the priest, man of the house and me and people.”
The Globe and Mail has it, as well:

You are not allowed to view links. Register or Login to view.


"In a published paper, Greg Kondrack of the University of Alberta says he's used powerful artificial intelligence to open a sliver of daylight in the murk.

He says the text is written in medieval Hebrew, with the letters of each word scrambled in a precise way and all the vowels dropped.
He says the first sentence begins "She made recommendations to the priest ... ""
He dismissed those who are skeptical by saying: "I don't think they are friendly to this kind of research," he said. "People may be fearing that the computers will replace them."

I think he's completely misinterpreting the skepticism.

It has nothing to do with fear of computers replacing people... the problem is people who claim solutions without showing the details of how they got from A to C. Without that information, it's an empty claim.


The technology he's developing is very interesting and potentially useful for historians and linguists but, unfortunately, there's very little information about how the first sentence was "decoded". How much is revealed and how much was subjectively interpreted? It's quite an awkward sentence.
When I was trained as a programmer, a long time ago, one of the first things I was taught was
GARBAGE IN, GARBAGE OUT. That goes for the AI even more than for other IT things. Why do you think Hebrew scholars did not touch it? The same reason I dont touch the Latin things, because it is an expletive I cant use here
(25-01-2018, 11:17 AM)Helmut Winkler Wrote: You are not allowed to view links. Register or Login to view.
When I was trained as a programmer, a long time ago, one of the first things I was taught was
GARBAGE IN, GARBAGE OUT. That goes for the AI even more than for other IT things. Why do you think Hebrew scholars did not touch it? The same reason I dont touch the Latin things, because it is an expletive I cant use here

Out of curiousity, I entered the first line of my You are not allowed to view links. Register or Login to view. into Google Translate.   The result was

Hindi - detected
shokchy okchokchaiin shokeeor ain chol tchotor schotchy shol l cthechy ky cthy 
शोकच्य ोकचोकचाईं शौकीऔर अं चोल चोटोर स्कॉटची शोल ल कथेच्य की स्थ्य

English
The tragedy of mourning and hatred of Scott Choke scooters

Despite citing Gordon Rugg's paper, it didn't seem to occur to the authors to test their method on text known to be meaningless.
I had a quick look at the paper, it certainly deserves attention, and the "minimum alphagram distance" is an interesting concept.

Briefly, they explore the way that the VMS is a monoaplhabetic substitution of an abjad, with subsequent reordering letters in individual vords.

Of course, stupid media ride before the hounds (as always): in the paper there are no claims of having "cracked" or "decoded" the VMS, more than that, the authors explicitly state that (in respect of the cited Hebrew phrase):

Quote:According to a native speaker of the language,
this is not quite a coherent sentence.

Also, they do not claim it's Hebrew, it's only that Hebrew turned out to be the most appropriate (according to their method) language, although taken from the very limited pool of possible languages.
(25-01-2018, 12:41 PM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view....

Despite citing Gordon Rugg's paper, it didn't seem to occur to the authors to test their method on text known to be meaningless.


This is a very good point.

Part of the method should be testing the algorithms on meaningless text to see how the software evaluates something that is not in any specific language. Even meaningless text can be generated according to a wide variety of algorithms... and still be meaningless.


It would not be easy to do this. How does one algorithmically distinguish hundreds of different languages AND, at the same time, filter out those with no useful content? This will probably keep the developers busy for some time.

----------
Even so, if it is known that a sample of text is meaningful, but the underlying language is not known, having a computer algorithm presort it and give some rational guesses as to what languages it might be would be a great tool for historical research.
"A text known to be meaningless" is not something well defined in respect of the method used by the authors. E.g. a text with randomized letters would produce results quite different from a text with randomized words.

Apart from that, they start from the assumption that the text is meaningful (relying upon Landini and Montemurro & Zanette). I wonder, however, why computational linguistics professionals would still, over and over, argue that Zipf's law is an indicator of a meaningful text.
I finally downloaded the paper and I'm trying to get through it on my (brief) lunch break.

Here are some excerpts from the paper:
  • "We assume that symbols in scripts which contain no more than a few dozen unique characters roughly correspond to phonemes of a language, and model them as monoalphabetic substitution ciphers."
  • "We further allow that an unknown transposition scheme could have been applied to the enciphered text, resulting in arbitrary scrambling of letters within words (anagramming)."

Read that second assumption very carefully. Read it again.

Within that assumption is a big problem...
  • If anagramming is applied in an algrorithmic way (following some kind of system) when enciphering text, it can be read back (it can be deciphered by the original creator and possibly by someone decrypting it).
  • However, if anagramming is applied in an arbitrary way, then it becomes a one-way cipher, the person who wrote it would have to devote a great deal of time and trouble to trying to read it again AND a person trying to decrypt it might end up creating words that are not actually there through the process of arbitrary de-scrambling. If you permit yourself to arbitrarily de-scramble letters, you can CREATE meaning (quite possibly the wrong meaning) out of ciphertext AND you can create meaning out of nonsense text.
For example:

The software might arbitrarily unscramble words to create the following decryption options from the same text:

a pelt minuet, let pa minuet,  eat it plenum, I lumpen teat, I peel mutant, tip en amulet, pi ten amulet, lineup at met, i melt peanut, lit me peanut, I temp lunate, eat multi pen, nee multi apt, net multi ape, I a tent plume, pit ate lumen, I pet lumen at, me ate tulip n, a pet until me, and more...

All from the word "penultimate"... and that's only in English. They are all anagrams of the same word.


The problem becomes worse if the text is assumed (or detected) to be an abjad.

Now the problem of arbitrarily anagramming it to decrypt the text is compounded by the subjective insertion of many different vowels in many different positions. Instead of 10 possibilities for interpreting a short word, there might be 30.


I'll accept the possibility of anagrammed text (it was not uncommon for ciphers to be anagrams), but arbitrary transposition codes are, for the most part, one-way ciphers and deciphering requires a great deal of subjective picking and choosing that may result in translations that have nothing to do with the original content.

[Unfortunately, my break is over, I have to read the rest of it this evening.]
(25-01-2018, 02:03 PM)Anton Wrote: You are not allowed to view links. Register or Login to view....

Of course, stupid media ride before the hounds (as always): in the paper there are no claims of having "cracked" or "decoded" the VMS, more than that, the authors explicitly state that (in respect of the cited Hebrew phrase):

...


I quickly read through the rest of the paper and you are absolutely right.

The media spun it as a "solution" but that is not what the researchers are claiming.
Pages: 1 2 3 4 5 6 7 8