(06-05-2022, 08:13 PM)Vfind Wrote: You are not allowed to view links. Register or Login to view.Only paragraphs are encoded with the Cipher disks set to 4 tokens at the beginning of the first line, mostly marked by one stem gallows in the same line. The paragraphs are up to 80 tokens (200 charters) long.
Hi Vfind,
Thanks for your answers.
I'll have questions about labels later (entropy may be a problem) but let's talk about the paragraphs first.
(Vfind kindly sent me his Word file "that will be published soon" with a complete tokenization (with colors) of all paragraphs so I have an unfair advantage.

)
There is too much that I don't understand to be sure of anything, so we'll see later about some issues that I have with token counts at the end of the Word file.
1) First, settings
They begin at the first word of "chapters" (paragraphs) with a disguised outer disk token: "disguised" because the single-stem gallows usually replace the double-stem gallows. All right, but, as the list of paragraph-initial tokens does not match exactly the list of outer disk tokens is there an equivalence between the 3 tokens:
cTho,
cThd,
cThy (paragraph-initial) and
kain, tain, tar (outer disk)?
BTW the outer disk tokens in your article (pdf) do not match the list in your Word file: Shouldn't the first one under Û/% and the last one under Î/$ be
td and
tShe?
A more difficult question: where do the settings stop? Are they all in the first word? Apparently not, because different paragraphs sometimes start with the same word, e.g. f79.31 and f79.35 start with
pol, and you write that they are never repeated.
2) Tokenization
It would be helpful if the rules for tokenization were written down. Already the first word of f1r, usually transliterated as
fachys, has a number of problems. It is meant to be
fochys: because you don't have
ka in the list of "chapter/outer" tokens, it must be
ko, with the single stem - double stem equivalence. Then, because you don't have
y in the list of medial tokens, it must be initial, and
ch must then be medial, right? When is the switch of initial and medial allowed?
3) Spaces and initial/final
Should we assume that initial, medial, final tokens match their use in Ottoman Turkish? The above switch, not only among settings, would be an exception and I haven't looked for other exceptions yet, maybe there are others.
Once this issue is sorted out, and since rotating the disks does not mix initial, medial, final tokens, wouldn't it be very easy to find the 3 tokens for space, before initial tokens and after final tokens? Once tokens for spaces are identified the disk settings follow. Putting both spaces and initial/final information in the ciphertext is redundant and negates the security ensured by changing the settings often enough to hamper frequency analysis.
Assuming that initial/medial/final disks can be selected freely, as any other way would not make sense, (also for word length, too short) what is the link (if any) between the strictly positional letters of the Ottoman Turkish alphabet and Voynichese? What is the reason for naming these 3 disks initial/medial/final if the choice of disk does not match the original (cleartext)?