Options

Voynichese's letters

Index
Voynichese's letters
RE: Voynichese's letters

Bluetoes101 > 02-06-2026, 10:44 PM

This path probably leads to CLS, if you haven't read about it, here is a link - You are not allowed to view links. Register or Login to view.
RE: Voynichese's letters

ReneZ > 02-06-2026, 11:55 PM

(02-06-2026, 03:26 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.post by ReneZ about his mod2 cipher system but the links in that post are dead.
You are not allowed to view links. Register or Login to view.

I revived the linked files as follows:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

The first is an approx. 10,000 words section of Pliny's natural history.
In the second file, each word has been replaced by a roman numeral according to a rule explained below.
Since the text has more than 3999 word types, I added (invented) the Roman numeral Q to represent 5000.
The third file introduces an alternative representation of the numbers (not Voynich-like).

The rule for substituting words by numbers (or codes) was as follows:

Make a list of plain text words and corresponding code words. In this case, the code word is the Roman numeral for the index of the word.

Now start off with a list of common words. I used the 400 most common words, sorted alphabetically.
These are translated I to CD.
Then, as translation progresses, whenever a 'new word' not yet on the list appears, assign this word the next free number.

This results in a situation where similar words appear near each other.

This feature is demonstrated by graphs created by @nablator in the original thread.
RE: Voynichese's letters

Jorge_Stolfi > 03-06-2026, 01:45 AM

(02-06-2026, 11:55 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Now start off with a list of common words. I used the 400 most common words, sorted alphabetically. These are translated I to CD. Then, as translation progresses, whenever a 'new word' not yet on the list appears, assign this word the next free number.

You could as well start with an empty dictionary. (To the Author with no computers, finding the 400 most common words would be a lot of extra work, with no obvious necessity).

If you have to start with a non-empty dictionary, it should be sorted and numbered in order of decreasing frequency, not alphabetically. That will result in the most common words having the shortest codes, which will increase the efficiency (bits of information per character) of the encoding.

Starting with an empty dictionary and assigning consecutive numbers as new words appear will tend to produce the same result: the most common words will tend to appear sooner, and thus get shorter numbers. But you had better "warm up" method by encoding a "page 0" to be discarded. Otherwise the encoded text will start like 1 2 3 4 5 2 6 7 4 1 8 ...

I suppose that the most practical way to implement this method would be to write each word and its code in two index cards, and arrange them upright in two boxes, one sorted by alphabetical order and the other by code order. Adding a new word would be only a bit more work than looking it up in this dictionary. Then one box would be used to encode, one to decode.

After enough words have been assigned, one could make a single-page dictionary with 50-100 common words,to speed up the search. But at that point the Author would have memorized the codes of most of those words.

The A/B switch could be a point when the Author decided to change the encoding of the numbers. Like at some point Romans started using the subtractive system and write IV instead of IIII, IX instead of VIIII, etc. In that case, maybe he kept the codes already assigned, maybe he updated them as he encountered them again.

But, anyway, that still would be stretching the word "practical" beyond the limit of any spandex suit...

Quote:This results in a situation where similar words appear near each other.

Starting with an empty dictionary would probably enhance this side effect.

All the best, --stolfi
RE: Voynichese's letters

ReneZ > 03-06-2026, 01:52 AM

(03-06-2026, 01:45 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.You could as well start with an empty dictionary. (To the Author with no computers, finding the 400 most common words would be a lot of extra work, with no obvious necessity).

Well, they would not have to be THE 400 most common words. Just common words would do it.
And I agreee that, the smaller the number, the larger the effect.
Doing a dummy start page and throwing it away is another way of achieving largely the same.

In any case, the purpose of this exercise at the time was to show that the effect noted by Torsten Timm, and used as evidence for the autocopy theory, can also arise with this straightforard method of encoding a known plain text. If I remember correctly, the effect in my sample text still exceeded the magnitude found in the Voynich MS.

But all this is not the purpose of the present thread...
RE: Voynichese's letters

ololololo > 03-06-2026, 10:23 AM

(02-06-2026, 10:44 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.This path probably leads to CLS, if you haven't read about it, here is a link - You are not allowed to view links. Register or Login to view.
The similarity of the symbols is indeed there, but it doesn't really help us, because it doesn't tell us anything, but I find the "kchsy" example interesting, because it suggests that the author might have used similar techniques to disguise letters in other words, such as replacing e with s and a with y (sometimes y looks more like a than e).
RE: Voynichese's letters

ololololo > 03-06-2026, 10:34 AM

(03-06-2026, 01:45 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(02-06-2026, 11:55 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Now start off with a list of common words. I used the 400 most common words, sorted alphabetically. These are translated I to CD. Then, as translation progresses, whenever a 'new word' not yet on the list appears, assign this word the next free number.

You could as well start with an empty dictionary. (To the Author with no computers, finding the 400 most common words would be a lot of extra work, with no obvious necessity).

If you have to start with a non-empty dictionary, it should be sorted and numbered in order of decreasing frequency, not alphabetically. That will result in the most common words having the shortest codes, which will increase the efficiency (bits of information per character) of the encoding.

Starting with an empty dictionary and assigning consecutive numbers as new words appear will tend to produce the same result: the most common words will tend to appear sooner, and thus get shorter numbers. But you had better "warm up" method by encoding a "page 0" to be discarded. Otherwise the encoded text will start like 1 2 3 4 5 2 6 7 4 1 8 ...

I suppose that the most practical way to implement this method would be to write each word and its code in two index cards, and arrange them upright in two boxes, one sorted by alphabetical order and the other by code order. Adding a new word would be only a bit more work than looking it up in this dictionary. Then one box would be used to encode, one to decode.

After enough words have been assigned, one could make a single-page dictionary with 50-100 common words,to speed up the search. But at that point the Author would have memorized the codes of most of those words.

The A/B switch could be a point when the Author decided to change the encoding of the numbers. Like at some point Romans started using the subtractive system and write IV instead of IIII, IX instead of VIIII, etc. In that case, maybe he kept the codes already assigned, maybe he updated them as he encountered them again.

But, anyway, that still would be stretching the word "practical" beyond the limit of any spandex suit...

Quote:This results in a situation where similar words appear near each other.

Starting with an empty dictionary would probably enhance this side effect.

All the best, --stolfi
My knowledge of cryptology is not as deep as yours , so all I can say is that I believe we should focus on simpler algorithms (or at least those of moderate complexity), as the author was limited by the tools available to him (a pen, a hand, and perhaps a few assistants) and his own knowledge (it is too unlikely that a book was created by a brilliant cryptographer who created an extremely complex and secure method).
RE: Voynichese's letters

ololololo > 03-06-2026, 11:02 AM

(03-06-2026, 01:52 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(03-06-2026, 01:45 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.You could as well start with an empty dictionary. (To the Author with no computers, finding the 400 most common words would be a lot of extra work, with no obvious necessity).

Well, they would not have to be THE 400 most common words. Just common words would do it.
And I agreee that, the smaller the number, the larger the effect.
Doing a dummy start page and throwing it away is another way of achieving largely the same.

In any case, the purpose of this exercise at the time was to show that the effect noted by Torsten Timm, and used as evidence for the autocopy theory, can also arise with this straightforard method of encoding a known plain text. If I remember correctly, the effect in my sample text still exceeded the magnitude found in the Voynich MS.

But all this is not the purpose of the present thread...
The purpose of this topic is to study the similarity of the VMS text with substitution properties (not all words are constructed according to the cortex-mantle-nucleus model; some can be obtained by combining the bigrams described in the post, which I find quite interesting).
Well, it really should have been said at the very beginning...
RE: Voynichese's letters

ololololo > 03-06-2026, 02:48 PM

(03-06-2026, 01:45 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.
(02-06-2026, 11:55 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Now start off with a list of common words. I used the 400 most common words, sorted alphabetically. These are translated I to CD. Then, as translation progresses, whenever a 'new word' not yet on the list appears, assign this word the next free number.

You could as well start with an empty dictionary. (To the Author with no computers, finding the 400 most common words would be a lot of extra work, with no obvious necessity).

If you have to start with a non-empty dictionary, it should be sorted and numbered in order of decreasing frequency, not alphabetically. That will result in the most common words having the shortest codes, which will increase the efficiency (bits of information per character) of the encoding.

Starting with an empty dictionary and assigning consecutive numbers as new words appear will tend to produce the same result: the most common words will tend to appear sooner, and thus get shorter numbers. But you had better "warm up" method by encoding a "page 0" to be discarded. Otherwise the encoded text will start like 1 2 3 4 5 2 6 7 4 1 8 ...

I suppose that the most practical way to implement this method would be to write each word and its code in two index cards, and arrange them upright in two boxes, one sorted by alphabetical order and the other by code order. Adding a new word would be only a bit more work than looking it up in this dictionary. Then one box would be used to encode, one to decode.

After enough words have been assigned, one could make a single-page dictionary with 50-100 common words,to speed up the search. But at that point the Author would have memorized the codes of most of those words.

The A/B switch could be a point when the Author decided to change the encoding of the numbers. Like at some point Romans started using the subtractive system and write IV instead of IIII, IX instead of VIIII, etc. In that case, maybe he kept the codes already assigned, maybe he updated them as he encountered them again.

But, anyway, that still would be stretching the word "practical" beyond the limit of any spandex suit...

Quote:This results in a situation where similar words appear near each other.

Starting with an empty dictionary would probably enhance this side effect.

All the best, --stolfi
By the way, maybe the author encrypted abbreviated versions of words (such as secundum - sec., ds - deus, c' - cti), resulting in a text consisting of words of 3-4 letters. This can also explain some of the voynichese's features.
Next Oldest Next Newest

Voynichese's letters

Index

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters

RE: Voynichese's letters