The Voynich Ninja

Roman numerals to Voynicheesy.

1. Convert plaintext to numbers using number values of the letters[1] then summing them (some words will have the same number)

2. Express the sum of the letters in a word, in Roman numerals e.g DCCLVII, in the non-subtractive form i.e VIIII not IX

3. Using multiple rules; including spaces, substitute groups of Roman numeral letters for groups of EVA-letters e.g ('IIII' --> 'in')

Example.
Some of the substitution rules:
'_CC' --> '_qo'
'CC'   --> 'ch'
'C'   --> 'k'
'XX' --> 'e'
'I_'    --> 'm_'

Some letter values:
t = 116
h = 104
e = 101
c = 99
a = 97

PLAINTEXT = "the cat sat on the mat"

the :: 321 :: [100, 100, 100, 10, 10, 1] 321 ['C', 'C', 'C', 'X', 'X', 'I']
cat :: 312 :: [100, 100, 100, 10, 1, 1] 312 ['C', 'C', 'C', 'X', 'I', 'I']
... etc ...

INTERMEDIATE = "CCCXXI CCCXII CCCXXVIII CCXXI CCCXXI CCCXXII"

CIPHERTEXT = "chkem qoshol qokeriin qoem qokem qokeol"

[1] Alphabetic numeral system : You are not allowed to view links. Register or Login to view.
     Numbers can be also be assigned arbitrarily or methodically e.g using a Polybius square.
   For this post, i used the the number representing the unicode code of a specified character.

==============================================================================================
This is a non-deterministic cipher, some words have the same number.

Also, i don't know (idk) if you can reverse the substitution rules for the letter-groups.

There is no way to derive the EVA-letters,
if this method was used, why, for example, substitute 'XX' for 'e', is it a stylistic choice ?
is there some underlying relation between vms glyphs, that can be seen as latin abbreviations, and the roman numerals they substitute for ?

With more work the rules could be changed, expanded upon, to render the ciphertext more like voynichese, but idk how close you could get.

Is it possible that with some small rule changes the voynichese dialects could be recreated ?

Is it possible that the substitution rules could be performed by a volvelle ?

I have not persued the idea further than this because, well, its kinda interesting but idk.

Voynichese checklist:
Entropy - check
Pseudo-repetition - check
Binomially distributed word lengths - hmm, with English plaintext its close, but using the same rules with italian plaintext , not so much.
Other - ?

I'm always interested when Roman numerals are involved. What are your thoughts on reversability?

The substitution rules are reversible, ( phew, that took a while )

As for getting from a summed number to a word ...
    Ahh, i see, it depends greatly on the numbers allocated the the letters.
doing it with a numerical increment of 1 between letters, as is the way i've done it, is really bad.

for the number 213 there are 82 letter sequences.
for the number 757 there are 54889 letter sequences.

Originally i used Greek letter values then switched to unicode values ( like an idiot ) because i thought it would be easier to compute.

the obvious fix is primes for the letters and multiply instead of sum.
I didn't do that because i wanted some numbers to decode into multiple words ( thats my personal preference for how voynichese works )
and i thought the resulting EVA-words would be too long.

In summary,   With a good choice of numerical letter values then yes, i believe this system to be reversible.
    Additionally, not necessary but creating a codebook with entries like { summed_number, word1, word2, etc } would make decrypting much easier.

(14-09-2024, 04:56 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view. i wanted some numbers to decode into multiple words ( thats my personal preference for how voynichese works )

Type token ratio (TTR / MATTR) for Voynichese is compatible with a 1 to 1 mapping. If anything, it's a little higher than the average European language (more Voynichese word types, in this context, a plain-text word type mapped into more than one cipher word type). See Lindemann "Crux of the MATTR" You are not allowed to view links. Register or Login to view.

Of course, there are reasons that make a 1 to 1 word mapping (nomenclator) unlikely e.g. LAAFU or the frequent consecutive repetition of the same word, or the absence of words that are frequent in all the manuscript (function words).

BTW I don't think this system explains "pseudo-repetition". Of course, it can occur, as in natural languages ("red rod" "hard card" etc.). The point is that in Voynichese the pattern is more frequent than random: something "pushes" similar words next to each other. Years ago, Rene proposed a number-based system that indeed generated significant pseudo-repetition (but not perfect repetition): You are not allowed to view links. Register or Login to view.

Switched to Italian plaintext and changed the letter values, reversing the letter_sum to letter values now gives a lot less possibilities.

And creating a codebook makes it much easier and shows the multiple word possibililtes per letter_sum
ranging from 1 to 16 words with an average of 6 and a mode of 2.

Again with better chosen letter values this could be improved the same with pseudo-repetition,
there are a few like this, but yes as MarcoP observed, not enough and not quite in the same manner as voynichese.

e.g "oteeedy kool oteein ool oteol qoorin keetol otiin"

This Roman numeral method works and its mostly reversible,
but the voynichese is a quite a bit off, also this method does not really explain glyph letter positions within a word,

Perhaps with the addition of a newline char so '\nD' encodes to seperate eva-char(s), you could then have \nD, _D, D, D_, D\n
but there are currently 22 rules and already 'daiin' will never occur alone which is wrong.

Sculpting those rules and adding others will take some time to do a lot of that comes down to finding out the best glyph sequences.
idk if i can even do it with a genetic alogrithm (GA) or Hill climbing algorithm (HC).

Overall, this would be an extremely laborious method, you need to calculate the letter sum of every distinct word in the plaintext
then apply the set of substitution rules to each word in a line as you write it, because of the included spaces.

A full decode of the start of Orlando furioso goes like this.

You are not allowed to view links. Register or Login to view.

For completeness, here is a dozen line sample of encoded text:

chshrin otchkoeriin eriin otchkoeetin otoedy qokein qooeetm kotm qosh tdaiin
dakshhy qooeear odin qodin kotm otchohy qokotol otoeerin oeear qokein
keetol otkoeerol ee otiin keetol qocheetm tdaiin qodiin qokoiin ool
chotm keeol oeedy qochm otcheetm otem otchkrol qooeem keeol tdaiin
dacheetin otchkrol oteeriin koeedaiin qoerin keetol kein oteem otchem eriin
dakshin ee oterol eriin otriin qokeerin qoerin koeedol otchem qoshedy
kodiin qokein oeedy qokoiin qochoeedin otchothy koeol kotdaiin oteeriin oeedy
chkeerin ohy ototol oeedy kool keetol qokol koeetedy otchoe oeedy
dackhtedy keeol qochoeedin qochiin keetol otkeol qoeedy qoee otshdaiin qoehy
chedaiin edy qooin keetol qoee ottin eerin otkoeetm keetol erin
ksh qokotol edy kotin edy kotin oin oeein kodaiin koeehy
dakoeedin qokeethy otem oterol keetol oin qokerol m qokshrin otksh

RobGea

Koen G

RobGea

MarcoP

RobGea

RobGea