The Voynich Ninja
Character entropy of Voynichese - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Character entropy of Voynichese (/thread-148.html)

Pages: 1 2 3 4 5 6 7 8 9 10


RE: Character entropy of Voynichese - -JKP- - 06-11-2016

(06-11-2016, 12:13 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.
(02-11-2016, 09:14 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.Only one section is dedicated to the VMS - that is, the last section 4.22 of the chapter. It is only nine page long, with one page dedicated to problems for students and two pages - to scans of two folios of the VMS. That is not what I expected from what I read about this book in the Internet - I expected much more volume to be dedicated to the VMS. However, the actual state of things is reasonable - the whole book is dedicated to solving tasks with computer, so the VMS is just one interesting illustration or application. It is not subject to any dedicated focus neither in the book on the whole , nor even in the chapter 4.

That said, four pages are dedicated to the brief history of the VMS (with focus on the names of Dee and R. Bacon) and the attempts to analyse it (Newbold, Brumbaugh). The names of Yardley, Friedman and Tiltman are mentioned, as well as articles of Oneil (sic!), Friedman and Tiltman.

So only three pages are left for the discussion of the statistical properties of the VMS, which is much less than I expected.

Any chance you could scan these pages (or at least the three most relevant ones) and upload them somewhere?  I know they're probably still copyrighted but I think you could call it "fair use".

Sam, for your info... quoting a short paragraph, with attribution, might be considered "fair use" but uploading three out of four pages (a substantial portion of the original) is not.


RE: Character entropy of Voynichese - Anton - 06-11-2016

To be honest, I'm not a big fan of modern capitalist copyright, but at the same time I don't want David to be sued (or otherwise complained with) by Prentice-Hall.

To be honest, again, I bought this book specifically because I could not find it online. As I said above, at Amazon it is currently ridiculously cheap.

I tried to summarize in my post as much as possible, so you can be quite sure that all essential information is there.


RE: Character entropy of Voynichese - Sam G - 06-11-2016

(06-11-2016, 12:27 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(06-11-2016, 12:13 PM)Sam G Wrote: You are not allowed to view links. Register or Login to view.Any chance you could scan these pages (or at least the three most relevant ones) and upload them somewhere?  I know they're probably still copyrighted but I think you could call it "fair use".

Sam, for your info... quoting a short paragraph, with attribution, might be considered "fair use" but uploading three out of four pages (a substantial portion of the original) is not.

It would be three pages out of the entire book, for a non-commercial purpose, and the material is largely of historical interest and obviously would not be harming sales of the book, which is out of print anyway... I'm not an expert in these matters but I think it would be justified under "fair use".

You are not allowed to view links. Register or Login to view.

(06-11-2016, 12:50 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.I tried to summarize in my post as much as possible, so you can be quite sure that all essential information is there.

Okay, that's fine.  I appreciate your summary and realize there's probably nothing there of great importance.


RE: Character entropy of Voynichese - -JKP- - 07-11-2016

(06-11-2016, 12:50 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.To be honest, I'm not a big fan of modern capitalist copyright, but at the same time I don't want David to be sued (or otherwise complained with) by Prentice-Hall.

...


It seems that many people are not fans of copyright.

I used to be an off-the-shelf software developer. When software piracy became rampant on the Web, I was no longer able to earn a living that way. Millions of copies of my software were online, but no one was paying for it. As a direct consequence of piracy, my income dropped by 99%.

So I switched professions and three years later, people starting pirating that and my income dropped by 90%. I switched professions again and then what I was creating began being pirated also, my income dropped by 93% and once again I had to switch professions.


I respect other people's copyrights. They do not respect mine. That puts me in a rather awkward position and I am really tired of retraining and retooling every three years. Personal feelings, aside, as long as it's the law, it would set a good example for the forum to respect copyrights. Even a book that is out of print may still have content that belongs to someone, or might be re-released at some later date.

Fair use usually only applies to a paragraph or two, not usually to several pages.

What Google Books does has already been challenged in court and they were told not to show so much of the books. They reduced the content for a while and then began increasing it again, in flagrant violation of the law. Pinterest also flaunts the law. It was supposed to be only thumbnails, but the images, which do not belong to them, are getting bigger and bigger again, and they now force you to sign up before you can click through to the original content. Google (which probably holds shares in Pinterest) also uses Pinterest to index images, instead of leading directly to the original sites, which deprives the original content creators of their Google positions.


All this "free" content has consequences. Those who create the content don't get compensated and when you don't get compensated jobs disappear and you can't pay the bills.



Sorry, I just realized this is off-topic but I don't particularly want to start a thread on this either because it's a topic that tends to get hammered to death.

I am very interested in the entropy discussions and have been working on a blog about it for some time. I don't know if I can clean up the wording and pictures quickly enough to post it this weekend, but I'll try to get it posted soon.


RE: Character entropy of Voynichese - Koen G - 07-11-2016

JKP: looking forward to your post!


RE: Character entropy of Voynichese - Anton - 07-11-2016

As a sidenote, all problems with the modern copyright model are because it is not in concordance with the modern technological foundations of our society. To try to enforce copyright in our days is much like trying to force folks to ride horses when there are autos for more than a century there. New society needs new models.

But copyright issues are of course off-topic in this thread, so anyone interested in this discussion please open a separate thread.

Returning to the book, I checked the concept of "fair use" in the US law (which is the place of copyright of this book) and there is the concept of the "research" purposes, and also it matters whether the reproduction of the copyrighted material does influence the market value or does not. In this case the book is long out of print, and the content (teaching people to use BASIC) is obsolete, and the prices at Amazon speak for themselves. So in my understanding taking photos of three pages and sending them in private for research purposes does fall unto the "fair use". But to avoid any mischief, it is common in science and engineering to just procure permissions for reproduction from the copyright holders.


RE: Character entropy of Voynichese - davidjackson - 10-11-2016

Quote: To be honest, I'm not a big fan of modern capitalist copyright, but at the same time I don't want David to be sued (or otherwise complained with) by Prentice-Hall.

Most considerate of you Anton! Smile
Although I should say that anyone wanting to pass around pirated copies of things should setup a dropbox, they'll be deleted if posted on here.

But let's stick to the topic, we seem to have gone off-piste again.


RE: Character entropy of Voynichese - Anton - 10-11-2016

I'm heading to checking Bennett's results, brand new transcription of You are not allowed to view links. Register or Login to view. already done... nine pages remain. Cool


RE: Character entropy of Voynichese - Anton - 05-01-2017

....While my plans to check Bennett's calculations are, as we Russians say, put off to the long box (~ dallied off), I decided to explore a simple example of a cipher of my own invention which decreases character entropy, as an illustration.

For plain text, I took the beginning of my entropy tutorial (draft). To keep the matters simple, I excluded numeration of sections, converted the rest of the numbers to words, excluded punctuation and hyphens. Thus I am left with 26 English letters, plus spaces and apostrophes - 28 characters in total. Also, there are no line breaks for simplicity. The upper/lower case is there, but it is not taken into account. 

The raw text is here:

You are not allowed to view links. Register or Login to view.
For the 1st order entropy calculation I used the following online calculator: You are not allowed to view links. Register or Login to view. (check "ignore case", uncheck "ignore space")

We see that for the plain text the figure is 4.07.

Now on to encryption. First of all, spaces are left as they are for simplicity. Now let's take a radical step towards lowering entropy. Each letter is encoded as a number, beginning with 1, so A = 1, Z = 26, and apostrophe = 27. Since we may have, say "BA" as well as "U" in the plain text, we need some means to distinguish between "21" and "21" in the encrypted text. In the former case it is the sequence of numbers 2 and 1, while in the latter case it is a single number 21. So, for teens and beyond, a dot is introduced in between to mark that the pair comprises a single number, and not a sequence of two numbers. Thus, "BA" would be "21", while "U" would be "2.1". Note that we don't need the dot in "10" and "20", since there are no standalone zeros - if a zero is encountered, it is only as element of "ten" or "twenty". Since we don't have anything like thirty and beyond, it is easy to distinguish that, say, "53" is simply "EC".

The result of the encryption is as follows:

You are not allowed to view links. Register or Login to view.
The 1st order character entropy is 3.18 which is way lower than even Voynich.

But this looks suspicious and betrays a cipher. We now need to instill this with a look of an unknown language. I would not invent a new alphabet, but just encode the numbers with pairs of letters using the table chart where A, O, I, D, S are rows, and R, N, L, M, Y, G are columns , and our numbers from 1 to 27 fill this table sequentially from left to right and from top to bottom. For example, 1 = AR, because A is the first row and R is the first column. Likewise, 2.3 = DY because number 23 happens to fit the cell of fourth row, fifth column. (I'm lazy to type the table, but surely you understand the principle). The choice of the number of rows/columns, as well as of letters, is arbitrary - I just tried to mimic the EVA bigram patterns, but I did not dedicate more than five minutes to that, so the result in this respect is not very close. However, it is good in terms of entropy - it is 3.42 - higher than with digits and dots, but lower that plain text and (still) lower than Voynich.

The text is this:

You are not allowed to view links. Register or Login to view.
Decryption is straightforward, because the cipher blocks are fixed in length (two letters each), and the process is run sequentially from left to right.

Note that the second step (digits and dots) is not really necessary. One could fill the encoding table with plain text characters at once. I just used that step to exclude ambiguities when I did "search & replace" in MS Word.

One may note the peculiar "morphology" and "grammar" of the ciphertext - some characteristic repetitiveness of the inner structure and restricted nature of word endings. We may introduce some false spaces to keep words shorter.

Of course this is not Voynichese yet - you see that all words are forced to an even number of characters, there are no gallows or gallows coverage, or some other features of the Voynich, so this is just an example.

But I have a gut feeling that something like this is going on there...


RE: Character entropy of Voynichese - Koen G - 05-01-2017

Wonderful example, Anton. It shows how entropy can be decreased and also rather effectvely why some people believe spaces must be fake. 

I have a feeling though, that making it mimic more properties of Voynichese will be, as we say in Dutch, a whole other pair of sleeves: much more difficult.