I finally downloaded the paper and I'm trying to get through it on my (brief) lunch break.
Here are some excerpts from the paper:
- "We assume that symbols in scripts which contain no more than a few dozen unique characters roughly correspond to phonemes of a language, and model them as monoalphabetic substitution ciphers."
- "We further allow that an unknown transposition scheme could have been applied to the enciphered text, resulting in arbitrary scrambling of letters within words (anagramming)."
Read that second assumption very carefully. Read it again.
Within that assumption is a big problem...
- If anagramming is applied in an algrorithmic way (following some kind of system) when enciphering text, it can be read back (it can be deciphered by the original creator and possibly by someone decrypting it).
- However, if anagramming is applied in an arbitrary way, then it becomes a one-way cipher, the person who wrote it would have to devote a great deal of time and trouble to trying to read it again AND a person trying to decrypt it might end up creating words that are not actually there through the process of arbitrary de-scrambling. If you permit yourself to arbitrarily de-scramble letters, you can CREATE meaning (quite possibly the wrong meaning) out of ciphertext AND you can create meaning out of nonsense text.
For example:
The software might
arbitrarily unscramble words to create the following decryption options from the same text:
a pelt minuet, let pa minuet, eat it plenum, I lumpen teat, I peel mutant, tip en amulet, pi ten amulet, lineup at met, i melt peanut, lit me peanut, I temp lunate, eat multi pen, nee multi apt, net multi ape, I a tent plume, pit ate lumen, I pet lumen at, me ate tulip n, a pet until me, and more...
All from the word "penultimate"... and that's only in English. They are all anagrams of the same word.
The problem becomes worse if the text is assumed (or detected) to be an abjad.
Now the problem of arbitrarily anagramming it to decrypt the text is
compounded by the subjective insertion of many different vowels in many different positions. Instead of 10 possibilities for interpreting a short word, there might be 30.
I'll accept the possibility of anagrammed text (it was not uncommon for ciphers to be anagrams), but
arbitrary transposition codes are, for the most part, one-way ciphers and deciphering requires a great deal of subjective picking and choosing that may result in translations that have nothing to do with the original content.
[Unfortunately, my break is over, I have to read the rest of it this evening.]