The Voynich Ninja

Full Version: A Cipher Thought Experiment
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(22-03-2025, 04:56 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.It's not an escape character, the whole cipher is multi-to-multi mapping, a few characters (usually 1 or 2) of the cipher correspond to 1 or 2 characters of the plaintext. With "/" the mapping is very simple, you replace any occurrence of "/X" with "x", except "/K" makes a "c".

My (better) understanding now is that the cipher is a polyalphabetic substitution of single letters of the plaintext (except "th") by 6 letters (K, O, P, R, S, T) with 6 alphabet selectors (/, 4, G, L, M, N) inserted in the ciphertext to signal a change of alphabet. The default alphabet selector (/) is omitted at the start of words.
I went through my chat logs with Claude, the following is a short summary of how the cipher was cracked. I'm omitting a number of failed attempts, this is the winning path only. Whenever I write "the languages" below, I'm referring to English, Latin and Ancient Greek, as indicated by the original author.

1) I asked Claude to compute the word length distributions of the cipher and evaluate whether these are compatible with the languages and how likely it is that the ciphertext word breaks correspond to actual word breaks. It produced some charts (example below) and estimated, that there is nothing weird with spacing and that the length distributions correspond to English and Greek better than to Latin, if we assume that on average it takes 1.8 ciphertext characters for one plaintext character.

[attachment=10206]

2) So I decided to go with ciphertext words as plaintext words and investigate multi-character substitution. I asked Claude to produce the frequency charts for prefixes and suffixes or the ciphertext and compare them to 10 most common words for each language. The result was inconclusive, so I decided that most likely we are dealing with a many to one scheme, where there are several ways of representing a single character. (I'm not sure if this was correct, probably @nablator can comment. My main focus was on investigating the usage of AI to break ciphers, not the specifics of this particular cipher.)

3) Then I asked Claude to produce top 100 short common word combinations for the languages (the likes of "this is", "most of", "has been", etc, for English, Latin and Greek), and filter out those that have no repeated letters ("it has", etc). Then I asked it to write some code to attempt identifying similar sequences in the code. Here I had to ask it to write some tests for the code, since Claude couldn't produce the right algorithm initially, but after a few attempts it succeeded.

4) After running this code Claude identified a few plausible ciphertext sequences corresponding to common word combinations. The top match was for some Greek, but Claude deduced that if the substitution is applied to the whole text, the result didn't look plausible.

5) The second top match was "it is" for "NO/T NO/S". I found it interesting that "/T" matches to "t" here and "/S" to "s", but even without this I would have tried to follow this match. So, I asked Claude to replace (by writing code) all "NO"s to "i" and all "/T" to "t" and all "/S" to "s", which produced the following text (beginning of the second block):

Code:
MTO iMRLONSMO LOP/ONPMO is LO MTO PLO/RMTONR/ONR LO TMO/PLRMO LTMOLTi/KLOtMOLT TO LOMTONRLO iR LOMTONRs it is LPGOiLRt OMP PMONRtMOLRi/K MRLO/RLP/LMO GKiMT i/RONR LONRLT GK/OOLTMONR 4S/OiRtS it is LO GTSLP/RiLT TMOR/PLRMO GOsiGR

This looked very promising, especially with "it is LO" and "is LO", so I asked Claude to adjust the mapping to add "LO" => "a". 

Then I found two repeating short sequences: "aNRLT" and "a/RMO", so I guessed that one of them is probably "and" and the other "are", after using these mappings the whole second block turned to:

Code:
MTO iMRaNSe aP/ONPe is a MTO ParMTOn/On a Te/PLRe dedi/Kated TO aMTOna iR aMTOns it is LPGOiLRt OMP PenteLRi/K MRarLP/Le GKiMT irOn and GK/OOden 4S/OiRtS it is a GTSLPrid TeR/PLRe GOsiGR LP/OMT 4TO d/ORi/K ORder and MTO i/OnO/K ORder MTO OGOter KOLRGOMRns are d/ORi/K arRanSed iR MTO PeriMPOraR STGSLRe GOnLRi/Ke a TRaTiMSi/OnaR PeriMPOraR Te/PLRe 4TO 4PrOnt MP/eatGOres eiGKTt KOLRGOMRns iRsTeaT OMP Si/KS MTO 4ROtOPes de/Pi/KT a 4TiMPPOrent MRGSMT/OLR/ONSO/K/aR KOnMPLRi/KT On eatMS MPase Re/PResentiGR MTO NPO/KTORGS OMP ORder ONPer KGTa/OS MTO iRRer KOLRGOMRns are iR MTO i/OnO/K ORder MTO MPrieLSe de/Pi/KTS MTO PanaMTOnai/K PROSeSi/On a n/OTaPRe LPrea/K MPrOMR 4TO TGS/Pi/KaR MRGSMT/OLR/ONSO/K/aR SGOLPNSe/KTS KarNPed iRtO MR/OST TeR/PLRes MTO GTSLPrid natGOre OMP 4TO TeR/PLRe RePLRe/KTS aMTens KOnRe/KTi/On GKiMT LP/OMT MRaiRLROnd 4KreOse

And this is where you can already read parts of the message, the rest was very easy, but Claude and I got stuck trying to identify the exact workings of the cipher (nablator ≫ Claude), so after a few more attempts I just gave the result with roughly 80% of the plaintext revealed to ChatGPT and asked it to deduce the missing pieces.
It's actually very promising outcome that it was possible for me to reconstruct the plaintext, while being wrong about the basic workings of the cipher. There is more than enough redundancy in the plaintext of most languages, so it could be possible to extract meaningful information without fully understanding how it was encoded. This mimics to some extent the approach I'm trying with the Voynich Manuscript.
(24-03-2025, 10:14 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.It's actually very promising outcome that it was possible for me to reconstruct the plaintext, while being wrong about the basic workings of the cipher.

If you have time for this, would you like to try solving You are not allowed to view links. Register or Login to view.? It's much more relevant to Voynichese, with positional patterns clearly apparent in words.
I can give it a try, but I think the sample could be too small. I'll ask Claude to compute some stats, maybe starting with Shannon information content. If this is a verbose cipher with 3 cipher text characters for one plaintext, then the resulting plaintext is way too small to produce large enough count of patterns to use in deciphering.

If trajan117 didn't provide additional large chunk of ciphertext, I'm not sure Claude and I would have cracked it.
(24-03-2025, 10:33 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.If you have time for this, would you like to try solving You are not allowed to view links. Register or Login to view.? It's much more relevant to Voynichese, with positional patterns clearly apparent in words.

I thought this text had been deciphered since then.
(24-03-2025, 12:15 PM)Ruby Novacna Wrote: You are not allowed to view links. Register or Login to view.I thought this text had been deciphered since then.

No, and byatan disappeared from the forum without giving additional clues nor ciphertext samples. Even if this cipher is not solvable because the sample is too small (I agree with oshfdk), I would be very much interested in a good algorithm or software for solving this type of cipher (a prime suspect for Voynichese).

I can generate longer samples of ciphertext from a possible decomposition in 3 strings of the words of byatan's cipher (one per plaintext letter), probably not perfect for solving byatan's cipher, but it doesn't matter if we want to solve this type of cipher in general and just need a longer ciphertext sample to test the software.

I wrote a solver to optimize the decomposition in 3 parts of the words but it gets stuck in sub-optimal situation very often and I can never know how far it is from the optimum (a frequent problem with the Hill Climbing algorithm).
Pages: 1 2 3 4