I have purchased the Bennett's book myself. I have already provided a general review of the book here: You are not allowed to view links.
Register or
Login to view. Now I would like to share what's actually written there about the VMS.
The VMS is dealt with in Chapter 4 "Languages". The chapter is large and stands for more than 20% of the eight-chapter book. However, not all the chapter is dedicated to Voynich. The chapter deals with various issues: valid language patterns occurrences in randomly-generated texts, identification of authors and languages of unknown texts, compression in message transmission, encryption and decryption of text messages.
Only one section is dedicated to the VMS - that is, the last section 4.22 of the chapter. It is only nine page long, with one page dedicated to problems for students and two pages - to scans of two folios of the VMS. That is not what I expected from what I read about this book in the Internet - I expected much more volume to be dedicated to the VMS. However, the actual state of things is reasonable - the whole book is dedicated to solving tasks with computer, so the VMS is just one interesting illustration or application. It is not subject to any dedicated focus neither in the book on the whole , nor even in the chapter 4.
That said, four pages are dedicated to the brief history of the VMS (with focus on the names of Dee and R. Bacon) and the attempts to analyse it (Newbold, Brumbaugh). The names of Yardley, Friedman and Tiltman are mentioned, as well as articles of Oneil (sic!), Friedman and Tiltman.
So only three pages are left for the discussion of the statistical properties of the VMS, which is much less than I expected.
The alphabet that Bennett uses is as follows (p. 192). He considers
a, i, l, o, e, h, p, f, t, k, r, n, q, d, y, v and
x as standalone characters. He treats composite (benched) gallows in the same way as they are treated in EVA transcriptions - as sequences of elementary characters outlined above, with
h as the final character and
t attributed twice in the case of the "gallows coverage". He, however, treats
iin as a single character and
in as another single character. Last, he distinguishes two variants of the
s - one as
s in
sh or in benched gallows, the other as
s encountered per se. The former variant he recognizes as
e with an apostrophe. However, he does not recognize the apostrophe as a single character (on the grounds that "the apostrophe appears only to follow с
<that is, e> throughout the entire manuscript" - which, of course, is wrong). Hence he treats these two variants of
s as two different characters.
It is not clear whether Bennett adopted any characters other than listed above. It is also not clear whether Bennett included space as a character into the calculation, but this seems likely, since space is recognized as such throughout all the preceding material of Ch. 4.
The bulk of the text analyzed has been the first ten pages of the VMS. No intermediary counts are provided, only the final result (p. 193), in bits per character here and hereinafter:
h
1 = 3.66
h
2 = 2.22
h
3 = 1.86
where the subscript index stands for the order of the entropy.
Earlier in the book (p.140), Bennett provides some results for natural languages, such as:
- English contemporary (cited from Shannon 1951)
- English Chaucer
- English Shakespeare
- English Poe
- English Hemingway
- English Joyce
- German Wiese
- French Baudelaire
- Italian Landolfi
- Spanish Servantes
- Portuguese Coutinho
- Latin Caesar
- Greek Rosetta Stone
- Japanese Kawabata
All tests, except the first and the last, used the 28 character Latin alphabet (letters, space and apostrophe). Shannon seems to have omitted the apostrophe, and the Japanese test was based on a 77-character set (76 kana and the space). For German language, characters with umlauts were substituted with the respective letter followed by the letter "e".
The results can be summarized as follows (my summary differs slightly from that on Rene's website: You are not allowed to view links.
Register or
Login to view. , and also Rene seems to omit the results for Japanese):
h
1 = 3.91 ... 4.81
h
2 = 3.01 ... 3.63
h
3 = 2.12 ... 3.1
The lowest value of the 1st order entropy is observed for Portuguese, and the highest - for Japanese.
The lowest value of the 2nd order entropy is observed for Spanish, and the highest - for Japanese.
The lowest value of the 3rd order entropy is observed for English Chaucer, and the highest - for English contemporary.
However, Bennett notes that Shannon's calculation of the 3rd order entropy has been approximate and based on inaccurate data. That excluded, the highest value would be for English Poe (2.62). Also, almost half of the cases (Japanese included) do not have the 3rd order entropy calculated.
Note that all calculations were made for texts of "narrative" style, with no highly conspected or highly abbreviated texts put under test.
Note, also, how the 77-character Japanese yields higher entropies than 27 or 28 character-alphabetted languages. Actually, it looks to me not very practical to directly compare entropies of languages with different alphabet sets. Rather, redundancies should be compared. Maximum character entropy is obtained when all characters are equally probable to occur (this is a fact mathematically proven). Thus, the respective maximized 1st order entropy value is given as log
2N, where N is the number of characters in the alphabet. This value is also sometimes called "0th order entropy". So, e.g. comparing Japanese with English Joyce (the case which has next to Japanese value of the 1st order entropy - that of 4.144), we'll have log
277 - 4.809 = 6.267 - 4.809 = 1.46 for Japanese and log
228 - 4.144 = 4.807 - 4.144 = 0.663. This means that, notwithstanding the fact that the character entropy for Japanese is notably higher, the Japanese alphabet in question is notably more redundant than the one used by James Joyce for his English works, because for the latter, the character entropy value is closer to the maximum possible value.
For Voynichese, assuming Bennett's 22 character alphabet (space included), we'll get log
222 - 3.66 = 4.46 - 3.66 = 0.8, which is at least way better than Japanese and close to that of Joyce.
It is strange that Bennett does not speak anything about this matter (discussing ratios such as h
1/h
2 although), while Stallings at least recognizes it, making use of "differential" entropies. The latter also shows that h
1- h
2 of Voynichese is considerably higher than in natural language samples. One will find that that is also true in respect to Hawaiian.
Returning to Bennett, he provides an interesting observation that h
n of Voynichese is approximately the same as h
n+1 of Western European languages.
It is often asserted that ciphers tend to increase the character entropy of the text. While stating essentially the same on p. 194, Bennett, however, readily provides two examples from Poe where this is not the case; one being an "extreme" type of cipher where all the message is effectively contained in the key, the other being a multiple-substitution cipher. Neither of these two would directly generate text comparable to Voynichese, but Bennett discusses only entropies, leaving Voynich "morphology" and "grammar" out of scope.
Finally, Bennett states that there are natural languages with low entropies, providing the example of Hawaiian with entropy values as follows:
h
1 = 3.20
h
2 = 2.45
h
3 = 1.98
This calculation was performed across the source text from a XIX c. book in a 13 (12 plus space) character alphabet introduced by missionairies in mid-1800s. "It has been estimated" - says Bennett - "that only about 100 people still" used this language in daily communication at the time of Bennett's writing.
Calculating h
0 - h
1 for this variety of Hawaiian, one gets log
213 - 3.20 = 3.70 - 3.20 = 0.5. Good work by missionairies.
