The Voynich Ninja
Engineering your own voynich - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Voynich Talk (https://www.voynich.ninja/forum-6.html)
+--- Thread: Engineering your own voynich (/thread-3592.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13


RE: Engineering your own voynich - nablator - 21-07-2021

Now looking for the longest repeated strings, there are quite a few for such a short text, so homophones for many tokens are unlikely.

ooiea Tiaeean bar nmaei Teib
T fiiinuy Timyun Taeaei
ir Tian Fooyun Fiiiyun
iaq meaei Tiavinn Ti
TiiiT fiiinuy Ti
aiiea Tiiiig Ki
qeK niinuy
...

Is that what deterministic means: only one cipher text is possible? This would exclude homophones and nulls.


RE: Engineering your own voynich - Emma May Smith - 21-07-2021

So far, just looking at some stats:
  • 234 words
  • 22 glyphs (though not convinced that the capital letters are just variants of their lower case counterparts)
  • Most words have relatively few tokens. Four is the highest I can find, but will double check.
  • Many words have a structure which suggests they're made from two segments (note the last word 'Ka' appears to be a single segment). The lists of first and second segments don't appear to overlap.
  • This would suggest that plaintext letters have two possible encodings: first or second. The list used depends on whether it is the first of second segment in a cipher words.
  • The ciphertext is bound to the simple pattern: 12_12_12_12_12_12 and so on. The spaces between words don't seem to count (although some might).
  • There may be a segment which indicates a space or the start of a new sentence.
  • 'bar' is the most interesting word, as It's the only one which begins 'b'. Could suggest a particular digraph
byatan, please don't give the solution yet, even if you wish to say I'm on the right track. I will keep working on the text til I get it all (or somebody else does).


RE: Engineering your own voynich - Emma May Smith - 21-07-2021

I may not finish this tonight, but the way I would move forward is:
  • Find some consistent way to break words into two segments. "Ka" is the first clue to doing this.
  • Count how many unique first and last segments there are. Many more than 26 and the cipher could be encoding phonemes. Fewer and it is encoding letters.
  • Do a frequency count on each segment. This should give you a clue as to which letters each segment might stand for.
  • Try to find adjacent segments which could represent digraphs or repeated letters. Work out from there.
Of course, I could be totally wrong, but this is my guess and it seems reasonable, even if wrong.


RE: Engineering your own voynich - Koen G - 21-07-2021

Entropy stats are not bad. (h0 h1 h2)

caps.txt
4.81 3.83 2.46
nocaps.txt
4.46 3.71 2.40
TTfull.txt
4.59 3.87 2.15

This means that if the source text was a modern language (or any real language), something was done to decrease entropy, possibly the addition of nulls or digraphs.


RE: Engineering your own voynich - Emma May Smith - 21-07-2021

I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.


RE: Engineering your own voynich - nablator - 21-07-2021

(21-07-2021, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.

The encryption process wouldn't be deterministic then.


RE: Engineering your own voynich - Koen G - 21-07-2021

It would be if a preceding glyph or word determines whether the next word becomes "Kal" or "Kaal". For example, if the preceding word ends in a vowel use one, if it ends in a consonant use the other. This would be pretty complex though.


RE: Engineering your own voynich - Koen G - 21-07-2021

Word ends and beginnings are suspiciously low entropy. Most frequent bigrams:


Code:
ii 88
n· 67
·T 53
·F 46
ae 42
·K 39
ea 37
Ti 36
ei 34
un 31
yu 31
g· 31
·m 29
ai 29
ia 27
ie 25
Ka 23
·f 23
in 21
·n 21
uy 20
aa 20
an 20
y· 20

Trigrams:
Code:
·Ti 36
iii 34
un· 31
yun 31
n·F 23
·Ka 23
aei 20
uy· 20
an· 18
ei· 17
ig· 17
nuy 17



RE: Engineering your own voynich - Emma May Smith - 21-07-2021

(21-07-2021, 10:30 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(21-07-2021, 10:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm beginning to think that not only are capital letters just variants so are repeated glyphs. So 'Kal' and 'Kaal' would be the same word.

The encryption process wouldn't be deterministic then.

Ah, I seemed to have missed that part of his description of the problem. I haven't got to that point of decoding, so we'll see if it works without.


RE: Engineering your own voynich - Koen G - 21-07-2021

This is what I think so far:
- Source text is first either split in syllables or bigrams. The latter is more likely.
- Each letter becomes one or more letters in the ciphertext. Different tables are used depending on the position.

For example: "I am a good codebreaker"
-> ia ma go od co de br ea ke r

If "a" is the first of both characters, replace with "Fai" (for example), if "a" is the second character, replace with "yun". This will result in many different word types with recognizable patterns. So "aa" would be "Faiyun".