Options

A simple substitution experiment

Index
A simple substitution experiment
A simple substitution experiment

ReneZ > 17-09-2025, 04:08 AM

As I was running some promising yet unsuccessful attempts to decode the Voynich MS text, I ran into the question: how difficult will it be to detect being 'close to' the solution? I decided to try something I had been thinking of on several occasions:

Take a known plain text, and substitute vowels with other vowels and consonants with other consonants. Will the result be recognisable? To try this out, I took a part of Dante's Inferno, and changed the vowels and consonants from the Italian frequency distribution to the Latin frequency distribution. For these distributions, I used the ones obtained empirically on this page: You are not allowed to view links. Register or Login to view.
for the Dante and Mattioli source texts. Following is the conversion table:

Code:
E A I O U N R L T S C D M P V G H F B Q Z X J K i e a u o t s r n m c l d p q b v f g h x y z k

Note that the Italian text had two fewer consonants than the Latin text, so I added J and K to make them equal.
Surprisingly, even though Italian and Latin are closely related, the conversion completely mixes up both vowels and consonants.
The two questions I had were:
- would the text be somehow recognisable?
- would the text show any indication of being meaningful?

The first is a definite no. The second is a bit more subjective, but I would also argue that it is: 'rather not'.
As the text is a fully grammatical known text with only simple substitution applied, and it largely follows a reasonable single character frequency distribution, this means that 'just looking' isn't sufficient to decide whether one is close to a solution or not.
A proper simple substitution solver (tool) should be used to test the result.
Here follows a moderatly short part of the resulting text. If anyone wants a longer sample to play with, let me know.

Quote:tir dixxu lir ceddat la tumnse qane
da sansuqea pis ote mirqe umcose
cvi re lasanne qae ise mdessane
eva hoetnu e las hoer ise i cume lose
imne mirqe mirqebbae i empse i fusni
cvi tir pitmais satuqe re peose
netn i edese cvi pucu i pao dusni
de pis nsennes lir git cv a qa nsuqea
lasu li r ernsi cumi cv a q vu mcusni
au tut mu git salas cud a q atnsea
netn ise pait la muttu e hoir potnu
cvi re qiseci qae eggetlutea
de pua cv a foa er pai l ot curri baotnu
re luqi nisdateqe hoirre qerri
cvi d eqie la peose ar cus cudpotnu
boeslea at ernu i qala ri moi mperri
qimnani bae li sebba lir paetine
cvi dite lsannu ernsoa pis ubti cerri
errus fo re peose ot pucu hoine
cvi tir rebu lir cus d ise losene
re tunni cv a pemmea cut netne paine
i cudi hoia cvi cut rite effettene
RE: A simple substitution experiment

Jorge_Stolfi > 17-09-2025, 07:07 AM

Well, the rhythm and rhyme are still OK...
RE: A simple substitution experiment

tavie > 17-09-2025, 07:34 AM

Beautiful. Have you considered it might be Proto Romance?
RE: A simple substitution experiment

ReneZ > 17-09-2025, 08:09 AM

(17-09-2025, 07:34 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.Have you considered it might be Proto Romance?

That is high on one of my lists....
RE: A simple substitution experiment

oaken > 17-09-2025, 09:49 AM

I might recognize that this looked somewhat language like at the line ends, if I was paying attention, but I wouldn't be able to identify the language. If the vms is a cypher, I'm not even sure how a programmer trying a new approach would know they've decoded it if the plain text dropped out of a decipherment attempt unless they were familiar with that language, especially if the plain text as has sometimes been argued is itself anomalous (e.g unique example of a language or orthographic system, unusual dialect, extremely idiosyncratic).
RE: A simple substitution experiment

oshfdk > 17-09-2025, 11:03 AM

(17-09-2025, 09:49 AM)oaken Wrote: You are not allowed to view links. Register or Login to view.I might recognize that this looked somewhat language like at the line ends, if I was paying attention, but I wouldn't be able to identify the language. If the vms is a cypher, I'm not even sure how a programmer trying a new approach would know they've decoded it if the plain text dropped out of a decipherment attempt unless they were familiar with that language, especially if the plain text as has sometimes been argued is itself anomalous (e.g unique example of a language or orthographic system, unusual dialect, extremely idiosyncratic).

You can collect samples of plaintext languages and use statistical and dictionary based scoring to estimate how close the result is to some plaintext language, but for this to work you need to have the right ballpark estimation of the set of languages you are trying to identify. I'm running some decoding attempts and if the plaintext language is far from what I expect it to be (say, it turns out to be Vietnamese or Armenian), then you are absolutely right, even if I get the correct cipher scheme, I won't be able to decode anything. In practice I think one needs some balance between the number of languages to try and the number of false positives one is willing to tolerate.
RE: A simple substitution experiment

oaken > 17-09-2025, 11:14 AM

(17-09-2025, 11:03 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.You can collect samples of plaintext languages and use statistical and dictionary based scoring to estimate how close the result is to some plaintext language, but for this to work you need to have the right ballpark estimation of the set of languages you are trying to identify. I'm running some decoding attempts and if the plaintext language is far from what I expect it to be (say, it turns out to be Vietnamese or Armenian), then you are absolutely right, even if I get the correct cipher scheme, I won't be able to decode anything. In practice I think one needs some balance between the number of languages to try and the number of false positives one is willing to tolerate.

Thank you. That sounds like a good idea, but how would the latin sample score against a program's output if it, for instance, produced latin that was as much of a mess as that on f116v? Do you think it would still stand out as a promising result?
RE: A simple substitution experiment

ReneZ > 17-09-2025, 11:45 AM

(17-09-2025, 11:03 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm running some decoding attempts and if the plaintext language is far from what I expect it to be (say, it turns out to be Vietnamese or Armenian), then you are absolutely right, even if I get the correct cipher scheme, I won't be able to decode anything. In practice I think one needs some balance between the number of languages to try and the number of false positives one is willing to tolerate.

So do you have a specific approach to decide what my 'cipher text' above would be?
RE: A simple substitution experiment

oshfdk > 17-09-2025, 11:46 AM

(17-09-2025, 11:14 AM)oaken Wrote: You are not allowed to view links. Register or Login to view.Thank you. That sounds like a good idea, but how would the latin sample score against a program's output if it, for instance, produced latin that was as much of a mess as that on f116v? Do you think it would still stand out as a promising result?

One of the problems of You are not allowed to view links. Register or Login to view. is understanding which glyph corresponds to which Latin letter (if any at all). So in a certain sense, if we assume You are not allowed to view links. Register or Login to view. is a coherent text in Latin, it's not the final plaintext, but maybe something closer to the cipher discussed here: You are not allowed to view links. Register or Login to view. (which has about half of the alphabet written without any changes, the rest is written with special symbols, s#m^wh%t s-m-l%r t# th-s ^x%mpl^).

If we assume the popular reading of You are not allowed to view links. Register or Login to view. with some plaintext Romance words is correct ("portas", "Maria", "multos"), then it would be one of thousands of partial match results, probably won't stand out much.

But personally, I don't think You are not allowed to view links. Register or Login to view. is a coherent (but mangled) text. Or if it is, the language is not Latin and quite far from Latin. The simplest explanation I have for You are not allowed to view links. Register or Login to view. (other than doodles/pen tests) is that it's some mnemonic related to the key of the cipher. Most encoding schemes use some mapping of symbols, without this mapping even knowing how the cipher works, decoding a ciphertext can be hard. Knowing the mapping, but not knowing how the cipher works, may offer reasonable protection by the medieval crypto standards. So, You are not allowed to view links. Register or Login to view. could be list of letters hidden as every third letter in this sequence or something similar.
RE: A simple substitution experiment

oshfdk > 17-09-2025, 11:49 AM

(17-09-2025, 11:45 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.So do you have a specific approach to decide what my 'cipher text' above would be?

I'm sorry, I'm not sure I understand the question. Do I know how to identify the plaintext language given it's some simple substitution cipher? Or generally, how would I approach deciphering an unknown substitution scheme?
Next Oldest Next Newest

A simple substitution experiment

Index

A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment

RE: A simple substitution experiment