The Voynich Ninja

Full Version: The Voynich Cipher Disk
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5
Hi Everyone,

thanks for the question Nablator.

Only paragraphs are encoded with the Cipher disks set to 4 tokens at the beginning of the first line, mostly marked by one stem gallows in the same line. The paragraphs are up to 80 tokens (200 charters) long.

Stand alone words like the 4 words on You are not allowed to view links. Register or Login to view. are not encoded like this, because stand alone words would need 4 setting tokens in front, which would be too much overhead. For stand alone words a different kind of coding is used. Each Voynich characters represents a sound so the word pronounces as if spoken in the original language, say a You are not allowed to view links. Register or Login to view.. Both coding methods blend into each other to avoid detection by using the frequent syllables of the same original language as tokens on the disks. 

e.g. if we put the most frequent French di and trigrams as tokens on the disks (ENT/ QUE/ LES / PAR ...),  the encoded text would randomly form many French words and Nablator as a French speaker would recognize this ( “Que les parent”). When trying to make sense out of the words in a paragraph you fail, because the tokens are encoded, but for stand alone words once you learned the pronunciation of each Voynich character you will understand the word, if it still exists in your language today.

Turkish speaker (Ahmet Ardıç) noticed that the Voynich text had many Turkish words built out of Latin letters. He made a transcription alphabet for the non Latin characters missing but fitting inside these words. 

For You are not allowed to view links. Register or Login to view. you only need to know that “t” is Y, “d” is S and “m” is K, all other characters pronounce as in Latin (they even look like Y S K). 
So "otaim dam alam" comes to  "oya ik sakal ak"  English: “two lace of white beard”.  

Find the whole alphabet at the end of the Voynich MS-Word file that will be published soon.

Please everyone keep asking anything unclear.

Thanks
(06-05-2022, 08:13 PM)Vfind Wrote: You are not allowed to view links. Register or Login to view.Only paragraphs are encoded with the Cipher disks ...

Stand alone words ... are not encoded like this...

Hello Vfind!

How do you determine which words are coded and which are not?
Hi Everyone,

thanks for the question Ruby.

All words in paragraphs are coded. Please look at pageYou are not allowed to view links. Register or Login to view.. There are 15 paragraphs marked by the big stars used as bullets. Each paragraph starts with a gallows character as needed to set the outer disk. In the first lines of each paragraph double stem gallows (DSG) are written as single stem gallows (SSG) to remind that the code has to be changed. Note that the SSG cannot be different letters than the DSG, because they occur multiple times only in the first line of a chapter. There is only one exception in line 23, but since there is no gallows character at the front of the line, clearly there is no code change. No language uses characters only in the first lines of paragraphs, so SSG are the same as DSG just marking something (the code change).

Now look at page You are not allowed to view links. Register or Login to view.. The stand alone words are not clearly associated to a paragraph. You might say they belong to the last one, but on many pages words on the margins are between paragraphs. Therefore these words cannot be encoded like the paragraphs with tokens. Their characters are You are not allowed to view links. Register or Login to view.. On this page next to the stand alone words the drawing of a You are not allowed to view links. Register or Login to view. with input, output, and selection appears. At the inflow, women of various stature and weight arrive; at the output, short and overweight women are skimmed out, leaving only tall and skinny ladies at the bottom left, the same process operated by model agencies.

The two words to the left otain olkal pronounced as   oyan olt al   ( t=y, k=t, ain=an) which Google You are not allowed to view links. Register or Login to view. to: “take a fishing rod ”, I leave that to your imagination. You might think this could be a coincidence, but there are dozens of other examples alike, look at the one mentioned in the article.

If someone can read these words and understands the coincidence with the picture, he thinks he can read the coded text as well, because the tokens out of frequent syllables form randomly understandable words, as explained in the last post. He would never suspect a coding and focus all his efforts on deciphering the gibberish as intended by the designers.

Please everyone keep asking anything unclear.

Thanks
[attachment=6495]
Vfind, can you give references from a classical dictionary, please? 
I don't trust an automatic translator for a language I don't understand.
(06-05-2022, 08:13 PM)Vfind Wrote: You are not allowed to view links. Register or Login to view.Only paragraphs are encoded with the Cipher disks set to 4 tokens at the beginning of the first line, mostly marked by one stem gallows in the same line. The paragraphs are up to 80 tokens (200 charters) long.
Hi Vfind,

Thanks for your answers.

I'll have questions about labels later (entropy may be a problem) but let's talk about the paragraphs first.

(Vfind kindly sent me his Word file "that will be published soon" with a complete tokenization (with colors) of all paragraphs so I have an unfair advantage. Smile)

There is too much that I don't understand to be sure of anything, so we'll see later about some issues that I have with token counts at the end of the Word file.

1) First, settings
They begin at the first word of "chapters" (paragraphs) with a disguised outer disk token: "disguised" because the single-stem gallows usually replace the double-stem gallows. All right, but, as the list of paragraph-initial tokens does not match exactly the list of outer disk tokens is there an equivalence between the 3 tokens: cTho, cThd, cThy (paragraph-initial) and kain, tain, tar (outer disk)?

BTW the outer disk tokens in your article (pdf) do not match the list in your Word file: Shouldn't the first one under Û/% and the last one under Î/$ be td and tShe?

A more difficult question: where do the settings stop? Are they all in the first word? Apparently not, because different paragraphs sometimes start with the same word, e.g. f79.31 and f79.35 start with pol, and you write that they are never repeated.

2) Tokenization
It would be helpful if the rules for tokenization were written down. Already the first word of f1r, usually transliterated as fachys, has a number of problems. It is meant to be fochys: because you don't have ka in the list of "chapter/outer" tokens, it must be ko, with the single stem - double stem equivalence. Then, because you don't have y in the list of medial tokens, it must be initial, and ch must then be medial, right? When is the switch of initial and medial allowed?

3) Spaces and initial/final
Should we assume that initial, medial, final tokens match their use in Ottoman Turkish? The above switch, not only among settings, would be an exception and I haven't looked for other exceptions yet, maybe there are others.

Once this issue is sorted out, and since rotating the disks does not mix initial, medial, final tokens, wouldn't it be very easy to find the 3 tokens for space, before initial tokens and after final tokens? Once tokens for spaces are identified the disk settings follow. Putting both spaces and initial/final information in the ciphertext is redundant and negates the security ensured by changing the settings often enough to hamper frequency analysis.

Assuming that initial/medial/final disks can be selected freely, as any other way would not make sense, (also for word length, too short) what is the link (if any) between the strictly positional letters of the Ottoman Turkish alphabet and Voynichese? What is the reason for naming these 3 disks initial/medial/final if the choice of disk does not match the original (cleartext)?
Hi Everyone,

thanks for the question Ruby.

To establish your trust into Google translate just find the Universal Declaration of Human Rights published by the UN in You are not allowed to view links. Register or Login to view.. Select any Turkish word or sentence, feed it to Google translate, and compare to the published You are not allowed to view links. Register or Login to view. version.

You are not allowed to view links. Register or Login to view. is a Turkish online thesaurus. Right-click Chrome and select “translate to English” then evaluate the synonyms.

Download any Turkish PDF dictionary and look up the words, just remember that Turkish is a agglutinative language so the words have to be split into their stems to look them up.

At the beginning I had the same doubts as you, but I found out that Google translate is the ideal tool for this task, because even slightly wrong spelled words will be recognized and after 500 years many words sound different today, like the word ik today is iki. Once found out, you verify with different sources, as I did.

I would be interested to hear your opinion about the flow chart.

Regards
(08-05-2022, 04:13 PM)Vfind Wrote: You are not allowed to view links. Register or Login to view....I found out that Google translate is the ideal tool...
If you are satisfied with Google translate, so much the better, I wish you rapid progress.
Hi Everyone,

Thanks for the questions Nablator.

1)First, settings:  cKh, cFh, cTh, cPh are taken as kch and tch to set the outer disks 34 tokens. 17 tokens start with k and 17 with t and continue aiin, ain, air, ar, al, ol, or, os, ch, che, sh, she, ee, e, o, d, y. (I adapted the example to look like the text on the disk on page You are not allowed to view links. Register or Login to view. using it as tokens)

Where do the settings stop? After 4 tokens(not one word). The spaces are generated by the final tokens and the outer tokens that finish “looking” final ( aiin, ain, air, ar, al, or, os, ol, o, d, y ). Check the repetition of the 4 first tokens of all pages first. Some paragraphs might not be clearly identified.

2)Tokenization: You are right the first page has some irregularities. It seems they pulled all the strings there.  a might be replaceable by o if the token is not found like m with r. The principal rule as you already found out is to find the token on the disk without ambiguity. If the space is not needed for that, sometimes it is not written or shorter as normal.

3)Spaces and initial/final: Spaces end tokens that are frequently used finals of Ottoman words to make the encrypted text speak able and looking like real language, I do not think there is a direct connection to their use in Ottoman.

4)Wouldn't it be very easy to find the 3 tokens for space?  Please note that the spaces are only on the “created”  Latin disk that serves as an example for the English text. The real disk has no spaces, the finals and initials of Ottoman make them unnecessary.

5)What is the reason for naming these 3 disks initial/medial/final?   99% of the time you find this tokens at the beginning/middle/end of a VM word, which is a greatly helps find them on the corresponding disks. This is where the names come from, but I used the words initial/medial/final because they popped up in Ottoman script as well. I noticed that the language was sensible to this concept(idea). The disks might have developed out of the 4 tables that transcribed Ottoman phonetically to Latin, just rotating them against each other. Then further improvements were made to hide the encoding.

Any more questions are welcome

Thanks
Hi vfind, your trust in google translate for this kind of work seems a bit optimistic:

"is google translate accurate":
You are not allowed to view links. Register or Login to view.

Google's Neural Machine Translation System:
You are not allowed to view links. Register or Login to view.
Quote:
"[google translates'] search technique ...which encourages generation of an output sentence that is most likely to cover all the words in the source sentence."

Your previous link: { Google translates to: "take a fishing rod" }
When i try it, it results in "take Oyan Olt".

As long as you are aware of the issues and as you say, you verifiy your text with different sources, things might work out.
(08-05-2022, 06:57 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Your previous link: { Google translates to: "take a fishing rod" }

When i try it, it results in "take Oyan Olt".
I get "take a fishing rod" but Google translate insists that it is Arabic: "Source language: Arabic". Also with autodetect.
You are not allowed to view links. Register or Login to view.

"oya ik" is not recognized. If input alone it finds a totally unconvincing best guess...
You are not allowed to view links. Register or Login to view.
Pages: 1 2 3 4 5