The Voynich Ninja

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

Now looking for the longest repeated strings, there are quite a few for such a short text, so homophones for many tokens are unlikely.

ooiea Tiaeean bar nmaei Teib
T fiiinuy Timyun Taeaei
ir Tian Fooyun Fiiiyun
iaq meaei Tiavinn Ti
TiiiT fiiinuy Ti
aiiea Tiiiig Ki
qeK niinuy
...

Is that what deterministic means: only one cipher text is possible? This would exclude homophones and nulls.

So far, just looking at some stats:

234 words
22 glyphs (though not convinced that the capital letters are just variants of their lower case counterparts)
Most words have relatively few tokens. Four is the highest I can find, but will double check.
Many words have a structure which suggests they're made from two segments (note the last word 'Ka' appears to be a single segment). The lists of first and second segments don't appear to overlap.
This would suggest that plaintext letters have two possible encodings: first or second. The list used depends on whether it is the first of second segment in a cipher words.
The ciphertext is bound to the simple pattern: 12_12_12_12_12_12 and so on. The spaces between words don't seem to count (although some might).
There may be a segment which indicates a space or the start of a new sentence.
'bar' is the most interesting word, as It's the only one which begins 'b'. Could suggest a particular digraph

byatan, please don't give the solution yet, even if you wish to say I'm on the right track. I will keep working on the text til I get it all (or somebody else does).

I may not finish this tonight, but the way I would move forward is:

Find some consistent way to break words into two segments. "Ka" is the first clue to doing this.
Count how many unique first and last segments there are. Many more than 26 and the cipher could be encoding phonemes. Fewer and it is encoding letters.
Do a frequency count on each segment. This should give you a clue as to which letters each segment might stand for.
Try to find adjacent segments which could represent digraphs or repeated letters. Work out from there.

Of course, I could be totally wrong, but this is my guess and it seems reasonable, even if wrong.

Entropy stats are not bad. (h0 h1 h2)

caps.txt
4.81 3.83 2.46
nocaps.txt
4.46 3.71 2.40
TTfull.txt
4.59 3.87 2.15

This means that if the source text was a modern language (or any real language), something was done to decrease entropy, possibly the addition of nulls or digraphs.

I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.

(21-07-2021, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.

The encryption process wouldn't be deterministic then.

It would be if a preceding glyph or word determines whether the next word becomes "Kal" or "Kaal". For example, if the preceding word ends in a vowel use one, if it ends in a consonant use the other. This would be pretty complex though.

Word ends and beginnings are suspiciously low entropy. Most frequent bigrams:

Code:
ii 88

n· 67

·T 53

·F 46

ae 42

·K 39

ea 37

Ti 36

ei 34

un 31

yu 31

g· 31

·m 29

ai 29

ia 27

ie 25

Ka 23

·f 23

in 21

·n 21

uy 20

aa 20

an 20

y· 20

Trigrams:

Code:
·Ti 36

iii 34

un· 31

yun 31

n·F 23

·Ka 23

aei 20

uy· 20

an· 18

ei· 17

ig· 17

nuy 17

(21-07-2021, 10:30 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(21-07-2021, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.

The encryption process wouldn't be deterministic then.

Ah, I seemed to have missed that part of his description of the problem. I haven't got to that point of decoding, so we'll see if it works without.

This is what I think so far:
- Source text is first either split in syllables or bigrams. The latter is more likely.
- Each letter becomes one or more letters in the ciphertext. Different tables are used depending on the position.

For example: "I am a good codebreaker"
-> ia ma go od co de br ea ke r

If "a" is the first of both characters, replace with "Fai" (for example), if "a" is the second character, replace with "yun". This will result in many different word types with recognizable patterns. So "aa" would be "Faiyun".

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

nablator

Emma May Smith

Emma May Smith

Koen G

Emma May Smith

nablator

Koen G

Koen G

Emma May Smith

Koen G