The Voynich Ninja
Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system (/thread-4520.html)

Pages: 1 2


Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - eggyk - 06-01-2026

This post became very long when I created it back in march of last year. I never dared to post it, assuming that it was probably obviously wrong in some way, but i'm posting it now in case something here sparks any ideas in others. I am completely open to this being nothing or irrelevant. I've tried to format this post as nicely as possible (given its length), with some sections spoilered to make it easier to read, but if you don't want to read the whole post i've added a Tl;dr at the end. 

Hello, while researching through the history of the VM, i came across "Polygraphie" by Johannes Trithemius. After auto translating page after page, I found an interesting technique that he describes as "enn'agrammaton". This consists of splitting the alphabet into a 9 square grid, and allocating letters to each square. A symbol is then encoded as that square, with 1, 2, or 3 dots under it to symbolise which letter it was. 

   

Clearly, if this was the case with the VM, each symbol would denote a character, and therefore be a simple substitution cipher. As we all know by this point, a simple substitution cipher can not accurately describe the VM text in any known language. I had a thought however: what if the dots were missing?

If you were to write using this simple cipher system, but without the dots (or other clear marker), leaving just the square symbol itself, what characteristics would the encoded text have?

To illustrate this, here is a sentence in english that i encoded in this way:


"This is undoubtedly due to the fact that English uses more combinations of two or more letters to represent single phonemes than Latin does"
"7336 36 75257172248 272 75 732 2117 7317 2534363 7626 4562 154135173556 52 775 56 4562 4277266 75 625626257 635342 53552426 7315 41735 2526" (abc=1,def=2 etc)
   
With dots:
You are not allowed to view links. Register or Login to view.

As you can see, with the dots under the symbols and knowing the system/key, the text is very easy to decipher. However, without a key (dots) this becomes much harder, even with the known system. This is mainly because instead of a 1-1 substitution, it is now a 3-1 substitution (1 cipher symbol represents 3 plaintext letters). Thus, when deciphering, a 1-3 substitution must occur somehow. This encoded text has some characteristics which may be relevant to the VM:

- It looks and has the feel of natural language (because it is encoded plaintext), with spaces and word lengths being conserved
- Converting plaintext to these symbols is very easy and can be done with only a few minutes practice, even if you do not understand what the plaintext means
- Symbols can repeat in sequence after eachother, such as in the word "represent" ("625626257")
- Some common words/letter groups are represented using the same symbols (to/up , lo/mo , to/un , th/ti , ile/ime/ike , ne/nd/oe/of/pe)
- Entropy has changed dramatically from the plaintext (i'll discuss this later as it's important)
- Normal frequency analysis fails to detect an obvious plaintext

Note that a simple substitution cipher would likely provide nonsense here, with something like "renrerent" or "sepsesept". If you do what many solvers do, and pick out likely common words and apply a mono-alphabetic substitution based on them, the rest of the sentence becomes nonsense.

Using the substitution for common english words:
THE, THAT, TO (732, 7317, 75)
(7=T, 3=H, 1=A, 2=E, 5=0)
and then the most common frequency english letters for the others:
(4=L, 6=S, 8=Y) you get:

"THHS HS TOEOTATEELY ETE TO THE EAAT THAT EOHLHSH TSES LOSE AOLAHOATHOOS OE TTO OS LOSE LETTESS TO SEOSESEOT SHOHLE OHOOELES THAO LATHO EOES"

Having some fun, i'll now interpret this in the same way many do.
You are not allowed to view links. Register or Login to view.

Clearly, with this many degrees of freedom, you can make literally any string of letters into any word in any language you choose.

Decoding back into plaintext reliably

My first intuition was that due to 3 degrees of freedom per letter per word you would be presented with an over-abundance of word choices, leading to the same issues above and with too much room for interpretation. The amount of permutations scales tremendously, at 3^n where "n" is the word length. This quickly creates thousands, sometimes hundreds of thousands of possible variations.

Method 1
Use a program to output every single possible permutation, 1 word at a time and sift out the possible solutions
You are not allowed to view links. Register or Login to view.

Method 2
Manually write down the first two letters, do the permutations for those, remove incorrect permutations, continue to next letter
You are not allowed to view links. Register or Login to view.

Method 3
Use an online dictionary with filters for word length and excluded characters
You are not allowed to view links. Register or Login to view.

Method 4
Turn every word in the dictionary into a number set and crosscheck against that
You are not allowed to view links. Register or Login to view.

If you don't know which language the ciphertext is in, can it be deciphered?

A way to do this may be as follows. First, create a number-dictionary for all likely languages. Enter the ciphertext and have the program determine whether or not each word had a possible variation. Each language can then be listed from "most likely" to "least likely". 

For example, using my example result from earlier, the program gave a variation for 100% of words (no words had no options). Skimming the text with some proficiency of dutch, its also obvious that words such as "is/letters/of" would also have been considered a hit in dutch. In fact, I will manually do the process now in dutch to check:

You are not allowed to view links. Register or Login to view.
"???? is ??????????? duf to wie dabt ???? ??????? tres korf ???????????? of vwo ns korf letters to ????????? single ???????? ???? latin does"

Doing this gets a 66.66% match rate. Frankly, many of these words are not really dutch words, but are either english words (single instead of singel), or acronyms like VWO (Voorbereidend Wetenschappelijke Onderwijs) or NS (Nederlandse Spoorwegen). The dictionary i used was from woordvinder.com, and it is generous to say the least. Either way, it is clear and obvious that even with a generous word pool to grab from, there is no grammatically correct or natural sounding dutch sentence to be found here. This is great!

I'm sure there would be multiple hits in many languages. I would be surprised however, if there were grammatically correct and natural sounding sentences in multiple languages. Please feel free to do the same process in another language to see if it spits out anything correct!


What about entropy?

This is something i could certainly use advice and input on. It seems to me that encoding in this way should have a significant impact on the entropy. Asking "if the first letter is R, what is the second letter likely to be?" clearly has many possibilities. In any case, the upper bound of any theoretical answer for any letter would be the amount of letters in the alphabet. 

When asking the same for one of the symbols on this grid, the upper bound of this answer is 8. There are only 8 symbols in use, so there are only 8 possible answers. Therefore, the chance of guessing correctly is far higher. In reality, the chance is even greater because of the english language. 

Lets ask the same question for R(q,r,s):
Rq,Rr,Rs / Rk,Rl,Rm are both unlikely letter combinations. This leaves a likely spread of 6 possible symbols. I do recognise that this does not hit the same level of predictability seen in the VM, but its definitely different to plaintext.

But, what if a grid was used that wasn't set up as the in-order latin alphabet? If a grid grouped together all vowels (just an example) into a single glyph, the chance that that "vowel glyph" comes after a consonant is higher than a second consonant. For a symbol representing (b,c,d), there would be an relatively high chance that the next letter is a vowel glyph (a,e,i,o,u). And then for the 3rd glyph, a decently high chance of either a vowel or consonant. After 2 consecutive vowels? Almost no chance of a 3rd vowel. After 2 consecutive consonants? Almost no chance of a 3rd consonant. 

This isn't even to mention systems that may have a glyph per 2 letters, per 4 letters, or a mix. The entropy of the same plaintext would vary per system used.


Biggest Issues with this idea

There are more than 8 voynichese characters
This certainly appears to be the case. In a system such as this however, multiple characters would be variants of the same square glyph. Tentatively looking at the 3rd ring on f57v, there is a possibility that multiple characters are actually variants of one another (k and m for example). I'm sure this has been discussed and brought up many times. This would mean that although there are more than 8 or 9 characters, there may be 8 or 9 groups of characters, with each group representing a single square glyph character. 

If there are multiple characters per square glyph -and there is no 1-1 substitution happening- why do the characters vary? How would the author know which one to write?
The assumption would have to be either that:
1) The writer had to follow a set of predetermined rules
2) The writer chose one of the symbols in the group based off personal preference

If 1) is the case, how many rules are needed to produce voynichese like text, and how easy is it to do?
I tried this a few times by taking voynich text, transcribing it into square glyphs with the grid system (with groupings I chose), and then writing back into voynich from the grid with a few basic rules. With simple rules that fit voynichese, I had relatively good success at accurately expressing the words correctly.

Here is an example ruleset and grid that I used. To be very clear, I am NOT saying this is a solution. This is simply a test to see if this type of voynichese -> square glyph -> voynichese can work without complicated rules.

Here is the example ruleset/guidelines: 
You are not allowed to view links. Register or Login to view.
And the process from the 2nd line of f58v 
You are not allowed to view links. Register or Login to view.
Does this work 100%? No. Does it work a lot better than random chance? Yes. Sometimes it works really well, and sometimes less well. Mind you, these were not an extensively thought out and analysed set of rules/guidelines, but rather my attempt based on some basic patterns I saw in the text.

There are essentially an infinite amount of ways this could be constructed; how do we know which one to use?

We don't. Maybe someone smarter than me has a way to construct this type of system that fits the VM, but I can't think of one beyond long winded computer programs, luck, or some kind of narrowing process. We would need to agree on a transliteration (or test every permutation of every possible grouping of every variation of every transliteration... YIKES) while taking into account that the text may be a mix of languages or use incorrect spelling (also YIKES). Thats a lot of effort if the VM doesn't use this type of system.

It may be a simple ruleset and character set, but without a key or something to tell us which sets are correct, it's simply one of millions of possibilities. The only bright side to this is that it may explain why noone has cracked it yet. I suppose another bright side may be that only a few systems will provide meaningful text, if the dutch example above holds applies across more languages.

Conclusion/Reasons to continue research in this area

As mentioned earlier, text presented in this way has some very relevant and promising parallels to Voynichese.

- It looks and feels like natural language, with spaces and word lengths being conserved
- Writing plaintext into Voynichese would have been quite easy
- The system could have been known and used in the presumed time period of writing
- Aspects such as repeated letter clusters can be reasonably explained, unlike normal substitution
- Many common letter groups can be represented using the same glyphs, explaining why many words/word endings appear the same 
- There are few degrees of freedom in the interpretation of the text into plaintext for a given system (there is either a coherent sentence, or not)
- Entropy is lowered due to this system, dependent on the exact system used
- The potential key/system making decoding reasonable would be easily demonstrable using a couple of pages (which may have been removed at some point)

The main issues being:

- There are an infinite number of systems that could have been used and we don't know which one
- Different hands may have had different keys/systems, or may have been writing from a different plaintext language
- For this system to be deciphered into plaintext, the reader of the text would likely need either a dictionary, another key, or many years of free time

Tl;dr There is a potential system of encoding text which would have been possible, would be difficult to crack, and shares some properties of Voynichese text. I don't exactly know why such a system would be used, or if such a system was intended (even if it was used). This was tested a little bit by encoding an english sentence and attempting to decode. Decoding to english was reliable, but decoding to dutch yielded no relevant results or coherent sentences. Entropy and other aspects were discussed but could very much use the input from others.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - nablator - 06-01-2026

This is a substitution with polyphones: some ambiguity is perfectly acceptable, a lot of ambiguity is a pain, unlikely to be worth the time, especially if you move the spaces, to account for the strong positional preferences of Voynichese glyphs in words. But it would result in (too) many short words: every time you are not allowed some combination of glyphs, you have to insert a space.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - oshfdk - 06-01-2026

I think the idea looks similar to You are not allowed to view links. Register or Login to view.

I'm not sure I understand how easy it would be to read this cipher. It's possible to perform a simple experiment, for example we can reduce the English alphabet to 13 characters by replacing each of the 13 less frequent characters with one of the 13 more frequent ones. How easy is it for you to read the below?

OILTRID IICHAEL ROSNICH ACDEIRED THE TAIOES ROSNICH IANESCRINT TROI THE LESEITS AT THE RILLA IONDRAHONE THE TACILITS IN NEED OT TENDS OAS DISCREETLS SELLINH SOIE OT ITS HOLDINHS ROSNICH NERCHASED  IANESCRINTS ONE OT OHICH OAS LATER TO CE ANOON AS THE ROSNICH IANESCRINT THOEHH THE OORA ITSELT NERNORTEDLS DATES TO THE EARLS TH CENTERS THERE IS HOOERER DISSENT AIONH RESEARCHERS AS TO ITS ORIHIN

I'd say the mental strain is quite high and in a large book there will be some cases where several readings would be possible.

Here's the mapping: 
B -> C
F -> T
G -> H
J -> L
K -> A
M -> I
P -> N
Q -> D
U -> E
V -> R
W -> O
Y -> S

EDIT: here's another sample

HER RENETATION AS A SONHISTIAATET ARSITER OF TASTE RAS ALSO FRELEENTLF LEOERAIET FOR NOLITIAAL INFLEENAE RITH IIFTIIOINI ESET TO RIN THE FAOOER OF THOSE ASOOE HER ANT INTEAE A TESIRE TO SEROE HER NEETS IN THOSE SELOR ISASELLAS NERFEHET ILOOES SEEH TO HAOE SEEN A NARTIAELARLF NOTENT SOERAE OF INFLEENAE RITH THE LEEEN OF FRANAE TESNERATE TO OSTAIN A NAIR RHAT FOE RANT IF FOERE IOINI TO SEROIOE THE ITALIAN RARS IS TO SE NOSITIOELF ON THE HINT OF THE KINI OF FRANAE ANT RHATS HORE INTIHATE THAN SEINI ON THE LEEEN OF FRANAES HANT SAFS AOAKRAH

(06-01-2026, 10:04 AM)eggyk Wrote: You are not allowed to view links. Register or Login to view.- Entropy is lowered due to this system, dependent on the exact system used

I'm not sure this is compatible with Voynichese, I think I read that for longer sequences of glyphs the entropy of Voynichnese approaches that of natural languages. If we just reduce the character set, wouldn't the entropy drop for all the lengths?


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - eggyk - 06-01-2026

(06-01-2026, 11:12 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand how easy it would be to read this cipher. It's possible to perform a simple experiment, for example we can reduce the English alphabet to 13 characters by replacing each of the 13 less frequent characters with one of the 13 more frequent ones. How easy is it for you to read the below?

OILTRID IICHAEL ROSNICH ACDEIRED THE TAIOES ROSNICH IANESCRINT TROI THE LESEITS AT THE RILLA IONDRAHONE THE TACILITS IN NEED OT TENDS OAS DISCREETLS SELLINH SOIE OT ITS HOLDINHS ROSNICH NERCHASED  IANESCRINTS ONE OT OHICH OAS LATER TO CE ANOON AS THE ROSNICH IANESCRINT THOEHH THE OORA ITSELT NERNORTEDLS DATES TO THE EARLS TH CENTERS THERE IS HOOERER DISSENT AIONH RESEARCHERS AS TO ITS ORIHIN

I'd say the mental strain is quite high and in a large book there will be some cases where several readings would be possible.

Here's the mapping: 
B -> C
F -> T
G -> H
J -> L
K -> A
M -> I
P -> N
Q -> D
U -> E
V -> R
W -> O
Y -> S

Yes, except that instead of the original plaintext letter being used a different new symbol would be used to represent both.

So for example:
BC=1 
FT=2
GH=3
and so on. 

If such a system was being used, you would expect that the person decoding the text would not even attempt to read and understand from the page, but would have a key/dictionary to decode the enciphered text. The same way that someone attempting to read a caesar cipher would not be able to simply read from the page. 

Using method 3 from my post (which is quite strenuous obviously) there is only one 8 letter english word which fits the third word ACDEIRED from your text (ACQUIRED)

   

If I had a dictionary to hand that listed 8 letter words in that number system, it would have taken me a few moments to search and then transcribe back into plaintext. It would be surrounded by:

51896054 - Acquiral
51896098 - Acquired
51896099 - Acquiree

(06-01-2026, 11:12 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I think the idea looks similar to You are not allowed to view links. Register or Login to view.

This seems to be based on the position of the letter within a word, making the process far more complicated for both encoding and decoding.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - oshfdk - 06-01-2026

(06-01-2026, 02:08 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Using method 3 from my post (which is quite strenuous obviously) there is only one 8 letter english word which fits the third word ACDEIRED from your text (ACQUIRED)

If I had a dictionary to hand that listed 8 letter words in that number system, it would have taken me a few moments to search and then transcribe back into plaintext. It would be surrounded by:

51896054 - Acquiral
51896098 - Acquired
51896099 - Acquiree

This means that it wouldn't be possible to "read" the book (as in, look at the page and convert the cipher into plaintext words in one's mind at a comfortable pace), the reader would have to decipher the text word by word using some external dictionary or by trial and error with a pencil and paper.

There is nothing wrong with this in principle, but my expectations of the cipher solution for a book is for it to be compatible with reading. Complex decoding schemes are good for short messages, I'm not sure they are practical for a volume (or collection) the size of the Voynich Manuscript. But these are just my preferences.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - eggyk - 06-01-2026

(06-01-2026, 10:59 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.This is a substitution with polyphones: some ambiguity is perfectly acceptable, a lot of ambiguity is a pain, unlikely to be worth the time, especially if you move the spaces, to account for the strong positional preferences of Voynichese glyphs in words. But it would result in (too) many short words: every time you are not allowed some combination of glyphs, you have to insert a space.

I don't understand why you would get combinations of glyphs that are "not allowed" using this system. The original plaintext -> Cipher/square alphabet -> voynichese process would be the source of those rules, and decoding them back would provide the original plaintext. 

 In my voynich example -along with some very basic rules- i converted from voynichese -> Cipher/square -> voynichese with relative success. To actually decode it, you would have to do a voynichese -> cipher/square -> Original plaintext. 

Both the voynichese and original plaintext would have to be translatable to a 3rd intermediary square/cipher "text" for this to work.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - eggyk - 06-01-2026

(06-01-2026, 02:17 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.There is nothing wrong with this in principle, but my expectations of the cipher solution for a book is for it to be compatible with reading. Complex decoding schemes are good for short messages, I'm not sure they are practical for a volume (or collection) the size of the Voynich Manuscript. But these are just my preferences.

I suppose it depends on how important the information was, how quickly it was designed to be read (or if it was designed to be transcribed at the end location, only once) and whether the speed of encoding was important. 

With the right key or ruleset included i can imagine that translating a page would not take all that long. Not much longer than a 1-1 substitution of letters. Encoding is not too slow using this method and is quite pleasant. With a 8-9 symbol ciphertext it doesn't take long at all to remember how to encode each letter.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - nablator - 06-01-2026

(06-01-2026, 11:12 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If we just reduce the character set, wouldn't the entropy drop for all the lengths?

For your first example (OILTRID IICHAEL ROSNICH...) It reduces h1 (4.1 -> 3.6) but increases h2 (2.7 -> 2.8).

You are not allowed to view links. Register or Login to view.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - nablator - 06-01-2026

(06-01-2026, 02:33 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.I don't understand why you would get combinations of glyphs that are "not allowed" using this system. The original plaintext -> Cipher/square alphabet -> voynichese process would be the source of those rules, and decoding them back would provide the original plaintext. 

I meant: if you want to mimic the positional nature of Voynichese glyphs and lower the 2nd order character entropy (h2) to match the h2 of Voynichese, you need more constraints: enforcing many rare or forbidden combinations one way or another is a way to achieve this. Inserting spaces is a way but it would create (too) many short words to look like Voynichese.

Using one character (letter or symbol) for several plaintext letters makes the next character less predictable after any given character, because there are more possibilities. This is the wrong way to go: if you want a ciphertext that is more Voynichese-like, you need to make the next character more predictable.


RE: Voynichese-like Characteristics of a keyless "Enn'agrammaton" encoding system - kckluge - 06-01-2026

Unless I'm misreading this somehow, this is identical to the type of encryption scheme Robert Brumbaugh proposed in the 70's. There's a Ninja thread on this here: You are not allowed to view links. Register or Login to view.. His specific assignment of glyphs to digits came from his reading of the marginal (Voynichese) text on one of the pages as cryptarithmetic problems. Back in the early days of the reading list Jim Reeds worked to analyze his paper "deciphering" the labels on a couple Zodiac pages and I looked at the "deciphered" Pharma page labels in another paper, as he didn't give the full details necessary to replicate his work. It goes without saying, his solution didn't go anywhere.

Could it work with a different assignment of letters to digits to glyphs? I don't know. You'll probably run into the same problem he did when he tried extending his technique from labels to the running text, which resulted in repetitive pseudo-Latin gibberish (which led him to the conclusion that the text was a hoax [by Dee & Kelly], with the decipherable labels intended to hook a potential buyer [Rudolph II]).