The Voynich Ninja

Full Version: A Cipher Thought Experiment
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(22-03-2025, 11:12 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.trajan117, I am just curious why you chose to use the particular character set of G, K, L, M, N, O, P, R, S, T, /, +, and 4?

In creating a test cipher like this -- where the symbols are arbitrary -- it would be more natural to use the characters A through M, or something like that.

I don't know the right answer, but since there is nothing from trajan117 so far, here's what I think: I suspect the character set is a clever way of making the ciphertext look like a language of sorts. For this it's important that most combinations of characters look like something one can pronounce. L, M, N, R, S can easily combine with most other consonants to create pronounceable sequences. GLROPSTONMLKO could be a word, if a strange one, on the other hand BDCADEFDBC looks like an unpronounceable non-word sequence. If one takes A-M: ABCDEFGHIJKLM, then only L, M and F can seamlessly combine with other consonants. I'm not sure about the slash, but maybe it makes it visually easier to decode the cipher, given that it's quite easy for a human decoder to lose the track of code boundaries in longer words.

I'm not sure I'm right about this, because it would have been just as easy to make any of the basic code sequences a complete syllable (GO, RE, FI, TO, FAN, etc), but maybe this would make the way to split the ciphertext into codes too obvious.
(22-03-2025, 04:56 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand all the details, since some of the combinations behave weirdly sometimes, could be some extra mechanic or could be typos, this is not really important as long as it's possible to reconstruct the message.

I see. There is a clever use of a binary state, either "copy mode" or "simple letter substitution mode", that resets to "copy mode" at each word start and after "/", and switches to "simple letter substitution mode" after any bigram substitution. Bigram substitutions are done first, left to right, otherwise there could be some priority issues.

Also some typos, for example:
MT/ORNO/K (thoric) should be LT/ORNO/K (doric)
(23-03-2025, 11:25 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.I see. There is a clever use of a binary state, alternating between copy mode and a simple letter substitution mode, that resets to copy mode at each word start and after "/", and switches to the simple substitution mode after any bigram substitution.

Also some typos, for example:

KLRO/SSNO/KLOR (clessical) should be KLRLO/SSNO/KLOR (classical)
MT/ORNO/K (thoric) should be LT/ORNO/K (doric)
NO/ONRO/K (ionec) should be NO/ONRNO/K (ionic)

Claude AI was struggling with these too, because I was asking it to write code to decode the ciphertext instead of letting it decode it directly. But I sort of cheated in the end, after identifying about 80% of the plaintext letters, I just switched to ChatGPT and asked it "to restore this partially scrambled text" and this is how I got the final results. This last part I could also do by hand, because it's quite easy to reKonstruKt an EnglNOsh tMOxt wNOth 20% of lMOttMOrs mNOssNOng. 

I have to say the AI was especially helpful in the beginning where I had to run all experiments for three potential plaintext languages. I basically told it to run various statistics for prefixes/suffixes and repeated substrings and try to interpret them separately for each of English, Latin and Ancient Greek. I think if manually decoding this would have been the most tedious part, which would take a whole day at the very least, even for someone familiar with all three languages, instead of 40-50 minutes (of which I actually was engaged for maybe 15 minutes, just working on other things for the rest of the time, waiting for Claude to complete each task).
I suspect that the AI (LLM) took advantage of the large parts that are in "copy mode" and the codes that include the letter that they represent (4T = th, LR = l, MR = m, NR = n) to guess the rest without figuring out exactly how the cipher works. LLMs are very good at filling the gaps and fixing typos.

This could be improved easily. Also the spaces should be substituted by multiple codes, as any frequent letter. Then frequency analysis would be impossible.
This is really cool stuff. Well done!

It will not help in understanding the Voynich MS text though....
(23-03-2025, 11:49 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.This could be improved easily. Also the spaces should be substituted by multiple codes, as any frequent letter. Then frequency analysis would be impossible.

This depends on the size of the corpus, I think. For example, if there are two variants for each letter or space, but the corpus is, say, four times as large, I think it still should be possible to extract all useful statistics. I would expect the required size of the corpus to grow polynomially with the average number of variants for each character. I didn't do the math on this one, just my intuition. To break the multi character substitution it's enough to decode the single weakest sequence and then progress from there towards more ambiguous things.
(23-03-2025, 01:59 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.This depends on the size of the corpus, I think...

It seems there is also the issue of having the plaintext solution available.  
Your use of chatGPT (and Claude) was really well done! (Your solution isn't complete, but it is apparent that it is pretty close.)  How far do you think you might have gotten without having the plaintext solution available? 

The other interesting statistical question (although we can only surmise the answer for this particular cipher and not in general) is just how many crib words would suffice to crack the code.
I thought these were typos but I was mistaken. These are examples of code compression, avoiding the repetition of the prefix (first letter) of a 2-letter code when it is the same in the next code:
GTMOLRR/O = GTMOLRLR/O (hello)
GKT/O = GKGT/O (who)
KLRO/SSNO/KLOR = /KLRLO/S/SNO/KLOLR (classical)
NO/ONRO/K = NO/ONRNO/K (ionic)
KORNORMTNOLONR = /K/O/RNONRMTNOLONR (corinthian)

That's really tricky and I'm not sure how it works in some cases.
For example why should 4TO (7th word) not be interpreted as 4T4O (thi)?
Maybe 4TO is a typo: it should be MTO.

The state is not binary, it is simply the current prefix (/, 4, G, L, M, N).

All codes are two-letter codes but the first letter of the code is omitted when it is the same as the previous one.
(23-03-2025, 03:41 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.It seems there is also the issue of having the plaintext solution available.  
Your use of chatGPT (and Claude) was really well done! (Your solution isn't complete, but it is apparent that it is pretty close.)  How far do you think you might have gotten without having the plaintext solution available? 

The other interesting statistical question (although we can only surmise the answer for this particular cipher and not in general) is just how many crib words would suffice to crack the code.

I don't think I had the plaintext solution available, what do you mean exactly? I don't think I used any crib words either. As I said before, I didn't use the information that the text is related to Parthenon nor I disclosed it to Claude/ChatGPT, I was just pleasantly surprised when the text turned out to be about the Parthenon.
(23-03-2025, 04:01 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view....I don't think I used any crib words either. As I said before, I didn't use the information that the text is related to Parthenon nor I disclosed it to Claude/ChatGPT, I was just pleasantly surprised when the text turned out to be about the Parthenon.

Oh, sorry -- I misunderstood. I thought you did.  Well done. (I also meant to say 'topic', when I said 'plaintext', thinking that you relied on the presence of assumed words. But I see you did not even do that.)
Pages: 1 2 3 4