The Voynich Ninja

Full Version: Voynich Manuscript: Numeric Enigma?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Voynich Manuscript: Numeric Enigma?
by Vladimír Matlach, Barbora Anna Janečková and Daniel Dostál

Link to preprint: You are not allowed to view links. Register or Login to view.

Abstract: 
Quote:In this article, we employ simple descriptive methods in order to explore the peculiar behavior of the symbols in the Voynich Manuscript. Such analysis reveals a group of symbols which are further analyzed for the possibility of being compounds (or ligatures), using a specifically developed method. The results suggest the possibility that the alphabet of the manuscript is a lot smaller, and steganographic type of encoding is proposed to explain many of the already known and newly revealed properties.
Thank you for pointing out this paper, lurker!

At the end of page 1, they misquote Rene saying that
Quote:
thanks to the carbon-dating method, an approximate date of its creation has been established: 1405—1435 (Zandbergen 2016).

Of course, the correct date range is 1404-1438.

I am delighted to see [Trinity] MS O.2.48 mentioned as a relevant parallel for the Voynich herbal (Figure 1). Here the authors fail to mention both the Trinity College and Rene Zandbergen, who pointed out the Trinity herbal on this forum.

Page 3:

Quote:Before we step into the analysis, first we formalize preprocessing of the Takeshi Takahashi’s transliteration (Takahashi 1999) of the Voynich Manuscript, which we used for the analysis and which we found to be the most thorough
I would like to know why they think that Takahashi's transliteration is better than Zanbergen and Landini's. ZL adds the text on the Rosettes diagram and the difference between certain and uncertain spaces.

I am very perplexed about the cipher system they propose:
  • some of the characters are seen as abbreviations (they call them "ligatures") and stand for multiple characters
  • once the "ligatures" have been expanded, sequences of three cipher characters (which they reduce to 10 and interpret as "digits") should be interpreted as single plain-text characters (basically a verbose cipher)
  • each plain-text character can be represented by several (e.g. 27) "digit" sequences
  • spaces are not significant

The system is "based on a slightly modified Trithemius steganographic and numeric code" (created more than half a century later than the VMS).

This complex cipher results in a huge number of possible ways to encode a single word:
Quote:the word “hello” could be rewritten into 27×27×27×27×27=27^5=14,348,907 variants

One would expect that this results in Type/Token Ratio TTR=1, i.e. a text that is only made of hapax legomena. So why does Voynichese words behave similarly to actual words, with more or less "normal" TTR, word entropy, Zipfian distribution?

Answer:

Quote:...the digit substitutes are picked arbitrarily by coders’ will. This may explain the observed autocorrelations of all the symbols and repeated words. First, the autocorrelations might be a consequence of a habit of picking specific substitutes for single digits or for larger blocks – like letters or words: picking the substitutes randomly each time is mentally exhausting, building a habit accelerates the coding work. Once such habit is established, it boosts the frequency of the substitute symbols locally. Such habits may also naturally fade off as the coder notices the repetition and builds up new habits. Such habits may also explain the proposed languages A and B, as the two main coders could encode the text by their own habits before the text was rewritten by individual scribers (Currier 1976).

It seems contradictory to speculate that the arbitrary choices of the scribes were systematic. If all observable patterns (low character entropy, repeated words, Currier languages...) are explained away as arbitrary preferences, what is the point of looking for patterns as evidence of specific encoding methods as the paper does? The idea that encoders tended to have systematic preferences is also historically inaccurate. Homophones in diplomatic ciphers were picked arbitrarily, but this was done in such a way that cipher characters tend to appear with more or less even frequencies: that is why homophones were introduced in the first place.

Finally, we already discussed that a verbose cipher in which spaces are not significant would be extremely hard to solve. The authors add that the method they propose could also be made largely unreadable by small encoding (or decoding) errors made in the process:

Quote:Another interesting property of this code is its fragility: misreading just one symbol shifts the whole code, making the result of the text unreadable (until hitting new line, paragraph, or any resetting sequence – which may be related to Currier’s statement that a voynichese line might be a functional entity (1976: 23) – or until making another mistake repairing the reading frame). In combination with the probable fact that we do not read the Voynich symbols right, we may easily fail to read the code even when knowing it is present.
Dear Marco and hello everyone,

first, thank you for your time that you read the article and wrote the post, as it's preprint, i was sure that there would be many things to fix and i hoped for some feedback. Paradoxically, i didn't know about this forum before (i found a back-link today).

I've fixed the wrong date, missing citations for figure 1 photos (and several more minor details).

I see the the steganographic/substitution ("encoding") system is cuasing the perplexion. It is simpler than it looks and the article maybe didn't do well in explaining it. So, i will try to show the method algorithm and its properties. Also, many of its properties are more clear when you try to use this system and try to write a page or two of encoded text by hand.

Let me show you the process:

Let's encode the word: "voynich" in the proposed encoding method (which is a slight, pragmatic modification of Trithemius code from Gaines; i will return to Trithemius below).

Step 1

First we need a letter-to-digits substitute table we agree on (original Trithemius):

Code:
a ... 111, b ... 112, c ... 113, d ... 121, e ... 122, f ... 123, g ... 131, h ... 132, i ... 133, j ... 211, k ... 212, l ... 213,
m ... 221, n ... 222, o ... 223, p ... 231, q ... 232, r ... 233, s ... 311, t ... 312, u ... 313, v ... 321, w ... 322, x ... 323,
y ... 331, z ... 332


the word "voynich" is thus substituted as: 321 223 331 222 133 113 132.

That is the first substitution part. Now comes the second step.

Step 2

We substitute each digit arbitrarily for any symbol we like from sets we define for each of the 3 gigits, e.g.:


Code:
1 could be substituted for any of the: "A, E, I"
2 could be substituted for any of the: "Q, O, Z"
3 could be substituted for any of the: "M, R, K"

(... instead of latin AEI, QOZ, MRK, you can imagine -- as we propose -- the simplest Voynich symbols as <i>, <l>, <e>, ...).

The single letter "v" ... 321 thus could have the following forms:

MQA, MOA, MZA, MQE, MQI, RQA, ROA, ROE, ROI, ..., KQA, KOA, KOE, KOI, KZA, KZE, KZI

... in summary, just a letter "v" could have 3 * 3 * 3 = 27 possible variants. This number of possibilities applies for each letter.

Word "voynich" could thus be encoded into 27 * 27 ... * 27 = 27^7 = 10 460 353 203 various unique strings, e.g. one of these is:
KZA ZQR MKA QZZ ARM AIM IKZ.

Step 3
The resulting string is long, 3x longer than the original text (which is rather catastrophic).

But if -- instead of AEI, QOZ, MRK -- we directly use the simplest possible Voynich glyphs, we could combine many of the incidences into larger glyphs. If the proposed idea in the article could be right -- the gallows <p> could be a compound of up to 5 simplest Voynich symbols. In this way, we could possibly shrink the encoded 3-times-longer text back into its original size just by "smart" tendency to pick substitutions (from step 2) that can easily form the compounds.

When trying this encoding system by hand, it becomes pretty quickly (mentally) exhaustive to still "randomly" pick the arbitrary substitutions at step 2. This leads into building substitution habits which allow you to spend less energy and write faster. Then, such habits manifest as symbol/word repetitions, reducing TTR.

(TTR of 1 or hapax-only text could be a (very fortune) product of true-random process, e.g. by using radioactive decay as a source of entropy etc. in the step 2, however, even with a such true-random-generator picker, we won't statistically expect TTR = 1, but i understand your point that TTR should be high, but it is not. However, this applies only for a true random picks. Human beings tend to create patterns even if directly asked for a random sequence as studied elsewhere. Regarding the entropy and the Zipfian distribution -- that is a good point, we are currently not in the situation where we possess manually encoded text in comparable length -- that is, unfortunately, the next target -- encoding a real text in a comparable length by this system by hand and mental entropy source to check the said properties. I'll mention this question in the discussion.)

"What is the point of looking for patterns as evidence of specific encoding methods as the paper does?"
This is a good question that leads into the question the article raises -- it seems that the Voynich manuscript -- even despite its unknown nature, all in all -- has a rule set that applies for the whole manuscript. The proposed rule set says that there could be a set of simple glyphs which, when they immediately follow, form a compound ("ligature") manifested by graphically more complex glyphs. That is practically all the article wanted to say. What is critical, the assumed glyph compounding is independent on the the proposed encoding system (and [font=Tahoma, Verdana, Arial, sans-serif]i [/font][font=Tahoma, Verdana, Arial, sans-serif]think this caused the perplexity as the transition from ligatures to the encoding system is a little bit harsh in the article). The manuscript thus can be, for example a gibberish [/font]generated by interesting autocitation method described by Torsten Timm. The proposed "encoding" is thus some kind of an alternative (the article does not intend to directly assume the Voynich manuscript /is/ encoded by this particular system), showing that there exist a simple-to-use, simple-to-read but hard to crack system that could produce many of the observed properties of the Voynich text and still carry a meaningful text and still look like gibberish.

Regarding the spaces -- the observation is that the spaces between Voynich words do not necessarily to be "spaces" and we account that role to symbol <o>.

Regarding the Trithemius -- yes, you are right that Trithemius book about steganography was published half a century later after the Voynich manuscript -- however, Trithemius fully explained and showed the steganography idea in that book, which goes against practice to keep working encryption methods in secret. This raises assumption that Trithemius does not need to be the author of the system and it could exist for a longer period -- it's a substitution of a substitution.

I hope that this sheds some more light to the article,
i'll be happy for any other feedbacks, to discuss or to explain more of its parts,
with all the best
Vladimir
Hi Vladimir,
welcome to the forum and thank you for the additional examples!
I still don't see that this method explains any feature of Voynichese: e.g. both the low character entropy and the "normal" word entropy/TTR are explained as resulting from "habits" of the scribe(s), not from the method itself.

I find Bowern and Lindemann's argument in favour of Voynichese words corresponding to plain-text words more convincing, but everything is possible of course.
All right, i absolutely understand your concerns about the encoding -- but let me return back (for a moment) to the primary point of the article.

It seems (from this discussion) that the main point of the article is the proposed encoding system, but it is not, and i'm quite afraid that this encoding "kidnapped" or overshadowed the primary points of the article (and the fact that the encoding part can be even deleted from the article without any harm as it is part of the discussion).

The main point the article tries to show is -- despite all the unknowns in the manuscript -- there seems to be a rule set for combining some of the glyphs together when they co-occur. Also, by analysis, it seems that the glyphs that can create larger compounds are graphically very simple and the assumed resulting compounds possess their graphical traits (therefore these could be understood as ligatures). (But if this is indeed true, many of the glyphs would disappear from the alphabet leaving it quite small -- this raises a question for any possible explanation.)

This observation stemmed from an observation that Voynich "letters" behave differently than anything we've seen yet (except the Torsten Timms artificially created texts) and even that some of the voynich "letters" behave differently even in the context of the manuscript itself. (I've edited the part "What is the point ..." above to reflect this a little more, i hope, for anyone reading from the beginning).

That is the the point of the article, so i'm a little bit sad and happy at the same time, as the only negative feedback was for the encoding but not for the ligatures part.

The aim of the proposed encoding should have been explained better and it's point was to propose explanation /why/ auto-citation phenomenon does not necessarily mean the text is gibberish hoax. But i will make this more clear i guess (if the proposed encoding part remains). At the same time, it explains how a small alphabet can be used to effectively encode a plain text and still look like gibberish.

The current plan -- and it is one of the most horrible things that are in my plan -- is to indeed write a full length manuscript in the encoding system and test Zipf, entropy along with other properties, so we can see, whether it fits or not. 

(Just for some laugh: for sure, we wrote a brute-force program that went through few billions of combinations testing all possible ways for ligatures and their substitutions and found a latin plain-text by this system with use of the default Trithemius table shown above, however, the plain text was not that long to prove anything so i discarded the result and cannot find it anywhere now)