Generated word tokens from chars ( pairs )

Generated word tokens from chars ( pairs ) - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Generated word tokens from chars ( pairs ) (/thread-2334.html)

Pages: 1 2

Generated word tokens from chars ( pairs ) - bi3mw - 10-03-2018

I've started an experiment to see if it's possible to generate word tokens from word types under certain conditions. The following conditions must be met:

The ratio of word types to word tokens should be as similar as possible to the VMS ( as extrapolated ).
The method should be simple enough to use, i. e. conversion to ciphertext and back to plain text should be possible without using a codebook or other tools.
Only character set A-Z and 0-9 was used.
All text should be generated with only one method. Subsequent adjustments were excluded.

The wordlist was generated from the "Regimen sanitatis Magnini Mediolanensis (vol. 1)". It contains 5391 words types. The plaintext has 25014 words, the generated text has 23264 word tokens.

Rules:

The first letter is always single, so "D" is 4.
If char is A-Y then it must be replaced with a number ( A=1, B=2, C=3 ...).
"Z" is space.
If char is a number then no change is needed.
If the second and third chars are "AZ" ( 1 ) then no division or swap is needed ( see "CAZC" ).

The diagram looks like this:

[Image: wordlist.png]

This is the procedure explained by the word "zodiacus":

You are not allowed to view links. Register or Login to view.

To be clear, i don't know if this procedure shows a similar result for other comparable texts. The table is attached.

You are not allowed to view links. Register or Login to view.

RE: Generated word tokens from chars ( pairs ) - Davidsch - 10-03-2018

Hmm, could you post the beginning of the text, for example 1 line, before and after?

You seem to generate more characters than the original text with the method
the plaintext has 25014 words, the generated text has 23264 word tokens, and that is fewer, how is that....?

I do not understand your example.

How to cipher your example word: "zodiacus" to D390ZX0 B36Z36 CAZC D39IZ30D

z=> D 26=space
o => 15
??

RE: Generated word tokens from chars ( pairs ) - bi3mw - 10-03-2018

(10-03-2018, 12:26 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.....
How to cipher your example word: "zodiacus" to D390ZX0 B36Z36 CAZC D39IZ30D
....

Hi Davidsch,

The encryption is based on the "russian peasant multiplication", also called "ethiopian multiplication". Only the replacement of numbers with letters has been added and the special function for "Z" (spaces) and "W" (filling odd words). It works like this:

You are not allowed to view links. Register or Login to view.

RE: Generated word tokens from chars ( pairs ) - -JKP- - 10-03-2018

(10-03-2018, 02:05 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.
(10-03-2018, 12:26 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.....
How to cipher your example word: "zodiacus" to D390ZX0 B36Z36 CAZC D39IZ30D
....

Hi Davidsch,

The encryption is based on the "russian peasant multiplication", also called "ethiopian multiplication". Only the replacement of numbers with letters has been added and the special function for "Z" (spaces) and "W" (filling odd words). It works like this:

...

So it's a verbose code that would be applied the same way to each word each time that specific word is encrypted? Would ZO always be "D390ZX0"?

If that's the case, one would decrypt the patterns in much the same manner as letters. It doesn't make any difference whether B is encrypted as F or as a strange symbol or as a group of numbers or as a group of letters and numbers, if it is the same every time. One could check common short words, letter frequency, vowels at the ends of words for some languages, vowel/consonant balance, etc.

RE: Generated word tokens from chars ( pairs ) - Davidsch - 10-03-2018

@bi3mw. The basic idea is good: make a new text based on another text and compare patterns and not the other way around (like most "impatients" do).
However, it would be better if you choose a cipher system that is much closer to the time and assumed language text. Such an exercise could be quite useful, because you can change primary simple patterns and tweak them according to the VMS-language in order to "understand" or "meet" the VMS-methodology. There are specific things you should notice, but you'll figure it out. I just wanted to understand your method.

RE: Generated word tokens from chars ( pairs ) - bi3mw - 10-03-2018

@Davidsch: Time is not a problem, this kind of calculation was widespread until the end of the Middle Ages (in some regions even beyond). You're right, the charset is a big problem. While writing the tool I'll put it in a configuration file Wink

The biggest problem is that the first "words" in the table (after the initial words in plaintext) should be word types. Only then one would have a structure comparable to the VMS. The newly created word types while generating the word tokens are extremely few (286) and would therefore not be significant.

RE: Generated word tokens from chars ( pairs ) - Davidsch - 11-03-2018

Perhaps you need to add nulls, and not try to compact as much as possible.

RE: Generated word tokens from chars ( pairs ) - bi3mw - 17-03-2018

@Davidsch: I've been thinking about using nulls, but that would change the average word length significantly. One could leave the first two letters in column B (word1) as uncrypted text and fill the rest with a random string of at least length 4. So one get back the word types and still have enough repetitive "words" (columns word2 - word8). In the end, you get a strange hybrid of encrypted text Sad

See table (word types are marked in red).

You are not allowed to view links. Register or Login to view.

RE: Generated word tokens from chars ( pairs ) - Davidsch - 17-03-2018

The Nulls are added only for specific "vms-words" for example when a "vms-word" is too small there could be added a null,
or when "letters" were merged into one "vms-word" an extra null could hide the fact what these were, in encrypted form.

(I am sorry for the cryptic text, but at this point in your example, I am also a bit lost finding your direction of reasoning in the method)

RE: Generated word tokens from chars ( pairs ) - bi3mw - 19-03-2018

(17-03-2018, 09:00 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.....
(I am sorry for the cryptic text, but at this point in your example, I am also a bit lost finding your direction of reasoning in the method)

This is probably because I have not sufficiently described the idea behind the method, sorry for that. The initial question was:

Is it possible to generate a text from word types of a plain text that shows similar characteristics as the VMS and from which these words can be decrypted ?

Basic conditions that must be met are:
The sum of the words in the generated text should not deviate too much from the source text. Also the ratio of word types to word tokens should not change significantly. The plaintext consists of 25014 words, the generated text of 23264. In the VMS , the ratio word types / word tokens" is 8114 to 37919. So the ratio is 1: 4,6732807493221. In the encrypted comparison text, the ratio is 5653 to 23002. Starting from the word types, one would expect a total text of 26418 words (at best). In my opinion, the deviation to the VMS is acceptable.

The generated text must be highly repetitive. In the encrypted text, 5391 word types face 17873 word tokens. All 17873 are generated from the 262 new word types. This condition is certainly fulfilled.

The average word length must be comparable. It is probably 5,6 in the VMS. The encrypted comparison text has a word length of 5,8.

Just to avoid misunderstanding, I do not claim that the manuscript was made this way. But it seems to be possible.

About filling with nulls, I had used that in another try. But far too long words were generated.

Finally, just for fun, this is what the text looks like (not a single thought of the charset) Wink