Dear Marco and hello everyone,
first, thank you for your time that you read the article and wrote the post, as it's preprint, i was sure that there would be many things to fix and i hoped for some feedback. Paradoxically, i didn't know about this forum before (i found a back-link today).
I've fixed the wrong date, missing citations for figure 1 photos (and several more minor details).
I see the the steganographic/substitution ("encoding") system is cuasing the perplexion. It is simpler than it looks and the article maybe didn't do well in explaining it. So, i will try to show the method algorithm and its properties. Also, many of its properties are more clear when you try to use this system and try to write a page or two of encoded text by hand.
Let me show you the process:
Let's encode the word: "voynich" in the proposed encoding method (which is a slight, pragmatic modification of Trithemius code from Gaines; i will return to Trithemius below).
Step 1
First we need a letter-to-digits substitute table we agree on (original Trithemius):
Code:
a ... 111, b ... 112, c ... 113, d ... 121, e ... 122, f ... 123, g ... 131, h ... 132, i ... 133, j ... 211, k ... 212, l ... 213,
m ... 221, n ... 222, o ... 223, p ... 231, q ... 232, r ... 233, s ... 311, t ... 312, u ... 313, v ... 321, w ... 322, x ... 323,
y ... 331, z ... 332
the word "voynich" is thus substituted as: 321 223 331 222 133 113 132.
That is the first substitution part. Now comes the second step.
Step 2
We substitute each digit arbitrarily for any symbol we like from sets we define for each of the 3 gigits, e.g.:
Code:
1 could be substituted for any of the: "A, E, I"
2 could be substituted for any of the: "Q, O, Z"
3 could be substituted for any of the: "M, R, K"
(... instead of latin AEI, QOZ, MRK, you can imagine -- as we propose -- the simplest Voynich symbols as <i>, <l>, <e>, ...).
The single letter "v" ... 321 thus could have the following forms:
MQA, MOA, MZA, MQE, MQI, RQA, ROA, ROE, ROI, ..., KQA, KOA, KOE, KOI, KZA, KZE, KZI
... in summary, just a letter "v" could have 3 * 3 * 3 = 27 possible variants. This number of possibilities applies for each letter.
Word "voynich" could thus be encoded into 27 * 27 ... * 27 = 27^7 = 10 460 353 203 various unique strings, e.g. one of these is:
KZA ZQR MKA QZZ ARM AIM IKZ.
Step 3
The resulting string is long, 3x longer than the original text (which is rather catastrophic).
But if -- instead of AEI, QOZ, MRK -- we directly use the simplest possible Voynich glyphs, we could combine many of the incidences into larger glyphs. If the proposed idea in the article could be right -- the gallows <p> could be a compound of up to 5 simplest Voynich symbols. In this way, we could possibly shrink the encoded 3-times-longer text back into its original size just by "smart" tendency to pick substitutions (from step 2) that can easily form the compounds.
When trying this encoding system by hand, it becomes pretty quickly (mentally) exhaustive to still "randomly" pick the arbitrary substitutions at step 2. This leads into building substitution habits which allow you to spend less energy and write faster. Then, such habits manifest as symbol/word repetitions, reducing TTR.
(TTR of 1 or hapax-only text could be a (very fortune) product of true-random process, e.g. by using radioactive decay as a source of entropy etc. in the step 2, however, even with a such true-random-generator picker, we won't statistically expect TTR = 1, but i understand your point that TTR should be high, but it is not. However, this applies only for a true random picks. Human beings tend to create patterns even if directly asked for a random sequence as studied elsewhere. Regarding the entropy and the Zipfian distribution -- that is a good point, we are currently not in the situation where we possess manually encoded text in comparable length -- that is, unfortunately, the next target -- encoding a real text in a comparable length by this system by hand and mental entropy source to check the said properties. I'll mention this question in the discussion.)
"What is the point of looking for patterns as evidence of specific encoding methods as the paper does?"
This is a good question that leads into the question the article raises -- it seems that the Voynich manuscript -- even despite its unknown nature, all in all -- has a rule set that applies for the whole manuscript. The proposed rule set says that there could be a set of simple glyphs which, when they immediately follow, form a compound ("ligature") manifested by graphically more complex glyphs. That is practically all the article wanted to say. What is critical, the
assumed glyph compounding is independent on the the proposed encoding system (
and [font=Tahoma, Verdana, Arial, sans-serif]i [/font][font=Tahoma, Verdana, Arial, sans-serif]think this caused the perplexity as the transition from ligatures to the encoding system is a little bit harsh in the article). The manuscript thus can be, for example a gibberish [/font]generated by interesting autocitation method described by Torsten Timm. The proposed "encoding" is thus some kind of an alternative (the article does not intend to directly assume the Voynich manuscript /is/ encoded by this particular system), showing that there exist a simple-to-use, simple-to-read but hard to crack system that could produce many of the observed properties of the Voynich text and still carry a meaningful text and still look like gibberish.
Regarding the spaces -- the observation is that the spaces between Voynich words do not necessarily to be "spaces" and we account that role to symbol <o>.
Regarding the Trithemius -- yes, you are right that Trithemius book about steganography was published half a century later after the Voynich manuscript -- however, Trithemius fully explained and showed the steganography idea in that book, which goes against practice to keep working encryption methods in secret. This raises assumption that Trithemius does not need to be the author of the system and it could exist for a longer period -- it's a substitution of a substitution.
I hope that this sheds some more light to the article,
i'll be happy for any other feedbacks, to discuss or to explain more of its parts,
with all the best
Vladimir