The Voynich Ninja

Pages: 1 2 3

Quote:which shows that the initial and final word parts of gallows words are independent and likewise unlikely to be in a natural language

This is a good argument for me. Just like "autocitation" it supports meaningless Voynich hypothesis.

Of course there could be some encryption that destroys natural language features. For example simple substitution ciphers are easy to break because symbol frequencies are the same as letters frequencies in the natural text. More advanced ciphers equalize these frequencies.

And it could be, as Mark says, that Voynich is mostly meaningless but there is still some info hidden among the noise.

(16-11-2025, 01:14 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.And it could be, as Mark says, that Voynich is mostly meaningless but there is still some info hidden among the noise.

I have been studying the text of the Voynich in detail which research I will present as soon as I am ready. I believe two things to be true.
1) There are some meaningless or null words.
2) Not all words are meaningless or null.

Therefore obviously words fall into two categories: meaningless and meaningful. The percentage of words in each category is hard to estimate. It seems to me most plausible that the majority of words are meaningless leaving a minority of words that are meaningful. My best guess is that about 80% are meaningless and 20% are meaningful, but it could be 90% meaningless and 10% meaningful or 60% meaningless and 40% meaningful.

(16-11-2025, 03:49 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.My best guess is that about 80% are meaningless and 20% are meaningful

80% of the text is bogus filler words? Why so much when fewer would have been more economical? So many more sheets of expensive parchment needed just to overbloat the manuscript with meaningless words?

The HerbalA1 text consists of 95 pages of 8086 words. An average of 85 words to a page. And then only 20% of those are real words. Making an average of 17 meaningful words. Is that really all there is to each of those pages? Hardly enough to make a meaningful sentence.

(16-11-2025, 04:53 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(16-11-2025, 03:49 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.My best guess is that about 80% are meaningless and 20% are meaningful

80% of the text is bogus filler words? Why so much when fewer would have been more economical? So many more sheets of expensive parchment needed just to overbloat the manuscript with meaningless words?

Cards on the table, I share your intuition about verbose cipher theories---though I will look at Mark's theory with an open mind if he ever feels confident enough to share it, not least of all because I don't think it's that hard to imagine someone paying for such an inefficient ruse. One possible reason for the expense? It's turned out to be a damned fine cipher that has foiled technology and code breaking know-how and resources a 15th Century writer could hardly have conceived. Whatever the person behind this cipher paid to keep people from reading it, I think it's fair to say they got what they paid for.

And yes, I understand this kicks the football down the field to "why the desire for a lot of secrecy?" and I would hope that a decipherment could answer the extreme lengths supposed here, but in a world without academic or religious freedom and no copyright protection, a measure of furtiveness is hardly inconceivable.

(16-11-2025, 04:53 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(16-11-2025, 03:49 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.My best guess is that about 80% are meaningless and 20% are meaningful

80% of the text is bogus filler words? Why so much when fewer would have been more economical? So many more sheets of expensive parchment needed just to overbloat the manuscript with meaningless words?

The purpose of using filler words(stegnography) is to confuse the decryptor trying to break the cipher. The purpose of filler words is similar to the purpose of null characters/symbols. Fewer filler words would be more economical, but more filler words makes it harder for a decipherer to spot the real words. You are effectively questioning the purpose of stegnography. However stegnography was a recognised useful technique in the fifteenth century by cryptographers as we see in the references in Leon Battista Alberti's work and Trithemius's work.

There is evidence that Voynichese wastes available space. Low entropy text is inefficient, this is why it is very unlikely that it would have been written like that even if the language sounded like that: extremely frequent patterns could be easily abbreviated.

My attempt, if it means anything, Smile

at reverse-engineering it from Latin, with a variable-length "zigzag path" verbose cipher, had problems with language B. My idea was that low entropy is caused by optimization, so I was generating the shortest or close to shortest possible ciphertext only, for a given key. I ran hill climbing programs to find the key, trying to minimize the size of the ciphertext and improve the coverage of the Voynichese vocabulary. Maybe I should have counted null words as successful (in the coverage): definitely something I should try.

Q13 and Q20 is where qokedy qokeedy loops are found, so I suspect that these two words are meaningless. I have no idea which other words might be nulls. It would be funny if all the common words typical of language B were nulls, making A and B statistics much closer after they are removed. Also something to try.

If one is producing a filler text for a cipher then how would one do it. What is the best way to design a filler text?

This is not an easy question to answer.

What I would observe is that to produce meaningless words involves more mental energy (i.e. thought) the more original and imaginative they are and when producing a very long document the mental energy and time required to produce the optimal set of filler words might be too costly. So, a simpler way to produce filler words would to copy and modify the real words one has written in the text and to use standard word structures or patterns to generate common filler words. This is what I think explains the types of filler words we see in the Voynich. It is also for this reason that I think words with the more distinctive and original spellings are more likely to be real words than words with the more repetitive spellings. I hope at some stage to present my research into distinctive words in the Voynich.

(16-11-2025, 10:22 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.People have noted within the text long distance effects, vertical pair effects, and other anomalies that cannot be explained if the text were a continuous narrative in a natural language.

"Cannot be explained", or just "no one has presented a good explanation" (while still allowing each parag to be continuous prose in natural language)?

When an anomaly is observed in the VMS text (like anomalous glyph frequencies at start-of-line), one should check whether a similar anomaly occurs also in other running texts in natural language. This control experiment is often lacking.

I just posted a quick test with a novel that shows anomalous word frequencies at start-of-line, caused by the trivial line-breaking algorithm. That an other natural distoritons must be happening in the VMS too

(16-11-2025, 10:22 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.the initial and final word parts of gallows words are independent and likewise unlikely to be in a natural language

The first claim is true. This has been noticed since the 60s. It is the justifcation for various words structure models.

But the second claim needs to be proved. There are many natural languages where the words have a strong prefix/core/suffix structure with a relatively small set of choices in each slot. Natural language evolution should lead to the three parts being somewhat independent, because that increases the efficiency of the language (bits per word).

But there will be deviations from independency. And looking at your table I see in fact evidence that the prefix and suffix of gallows words are not really independent -- only somewhat so.

All the best, --stolfi

(16-11-2025, 07:34 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Low entropy text is inefficient

But the entropy of Voynichese is not particularly low. The entropy per word, IIRC, is about 10 bits - quite in the normal range.

As for the entropy per character, it depends on the spelling system, not on the language. Iiff yyoouu ssppeellll EEnngglliisshh bbyy ddoouubblliinngg every letter, its entropy per character will drop by half. If instead you insert after each letter a totally random null letter, lower or upper case, the entropy will be halfway between the original value and the max value of 5.7 If you map each letter randomly to upper or lower case, the entropy will increase by 1 bit per character.

In the case of Voynichese, the entropy per character depends on whether one counts each EVA character as a separate letter, or if one considers each of Ch, Sh, ee, iin, etc to be a single letter.

And one must look at higher-order entropy, at least order 3 or 4 for characters.

All the best, --stolfi

(17-11-2025, 12:10 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(16-11-2025, 07:34 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Low entropy text is inefficient

But the entropy of Voynichese is not particularly low. The entropy per word, IIRC, is about 10 bits - quite in the normal range.

As for the entropy per character, it depends on the spelling system, not on the language. Iiff yyoouu ssppeellll EEnngglliisshh bbyy ddoouubblliinngg every letter, its entropy per character will drop by half. If instead you insert after each letter a totally random null letter, lower or upper case, the entropy will be halfway between the original value and the max value of 5.7 If you map each letter randomly to upper or lower case, the entropy will increase by 1 bit per character.

In the case of Voynichese, the entropy per character depends on whether one counts each EVA character as a separate letter, or if one considers each of Ch, Sh, ee, iin, etc to be a single letter.

And one must look at higher-order entropy, at least order 3 or 4 for characters.

All the best, --stolfi

Are we allowed to make a thread just talking about the mathematics involved in entropy?

Pages: 1 2 3

Rafal

Mark Knowles

dashstofsk

rikforto

Mark Knowles

nablator

Mark Knowles

Jorge_Stolfi

Jorge_Stolfi

Doireannjane