entropy splits in parts of lines?
geoffreycaveney > 01-10-2020, 01:22 AM
It is well known that certain structures and patterns are typical of the beginnings of lines, and very different structures and patterns are typical of the ends of lines, in the Voynich ms text. But has anyone studied the relative entropy levels of just the beginnings of lines, and just the ends of lines, compared with the whole text? For example, what is the entropy of just the first halves of all lines of the ms text? And of just the second halves of all lines? What about just the first three vords of all lines, and just the last three vords of all lines?
I suppose the expectation would be that all such entropy levels would be even lower than that of the whole ms text, since the beginnings of lines by themselves and the ends of lines by themselves would be expected to be even more similar to each other than the whole text is. But I wonder if that has ever been tested. There is also plenty of repetition within each line, from beginning to end, so I wonder if breaking apart line beginnings and line endings will really lower entropy so much. I also wonder if there is any significant difference in the entropy of just the beginnings of lines and just the ends of lines.
The reason I ask is that I have come across certain groups of lines in my research where I find it easier to interpret certain parts if I only read the first few vords of each line and continue with the first few vords of the next line, ignoring the rest of each line. But this may well be an illusion on my part, which is why I am curious about the entropy statistics of just the beginnings of lines and just the ends of lines.
It would be possible to encrypt a message by only making the first three words of each line meaningful, and padding out the rest of each line with nonsense nursery rhyme repetition of the sounds of the first three words:
meet me at fleet be mat sleet we vat
the back door he lack moor we sack poor
of jons house off cons mouse scoff nons louse
monday at noon sunday cat moon runway sat loon
This doesn't seem like a very secure level of encryption, but now combine it with a simple substitution cipher, or even better a mysterious invented script that no one else knows. Then it would become rather difficult to decipher. By the standards of the early 15th century, it would have probably been quite secure. And both the concept of simple substitution (possibly incorporating elements of a verbose cipher, as we have recently been discussing elsewhere on this forum) and the steganographic concept of hiding the words of a meaningful message within a larger nonsensical message were simple enough to have been known and possibly employed in the time period of the Voynich ms.
Also, in filling out the nonsensical parts of each line, the author could very well have followed some of the principles of the "auto-copying" or gibberish theory that has also been discussed recently on this forum. That could have been deliberate, or it even could have happened unconsciously as the author thought of nonsense rhyming words and phrases to fill out each line. I found myself doing it as I wrote the lines above, at first accidentally and then deliberately after I noticed I was doing it. It's only natural to take "inspiration" from the other nonsense words and phrases that are already in the immediate vicinity of the line that one is filling out. And for the author and the intended recipient, it doesn't matter what those words and phrases are anyway.
If the principle of filling out the nonsense parts was based on choosing rhyming words and phrases, as in my example above, then one would also expect the middles and ends of words to have much lower entropy (more predictability) than the beginnings of words, a statistical feature that we also observe to be present in the Voynich ms text.
I am aware that the beginnings of lines also tend to be repetitive in the Voynich ms text, so I do not at all expect the idea I raise here to solve all the problems inherent in the difficult structures and patterns of the ms text. But I'm curious if the entropy breakdowns by parts of lines may give us some clues and leads to follow for further and more sophisticated examination of these ideas.