The Voynich Ninja

Pages: 1 2

Two arguments against the nomenclator cipher hypothesis that I have met with in discussions were:

1) It would be unusual to encipher a whole book in such manner
2) It would require a huge nomenclator to encipher a volume of this size

Considering argument #1, I think that the VMS is cryptologically unusual (if not unique) anyway, so the argument of unusualness-to-be does not look a strong one to me.

Considering argument #2, I tried to estimate the size of the nomenclator, assuming what is supposedly the worst-case scenario (OK, upon further consideration it is not the worst case of course, but let's say, a "worse case") - namely, that each unique plain text word is matched to a unique vord.

According to voynich.nu, the vocabulary of the VMS is comprised of ~8100 unique vords. That would be the size of the nomenclator in question. However, the nomenclator must contain not only vords, but also their translations into words. This way, the amount of tokens in the nomenclator doubles and results in 8100 x 2 = 16200.

What folio space would be required to hold that many of tokens? This we can deduce from Q20 which presents a convenient example of folio space filled almost exclusively with text, without drawings. 23 pages of Q20 (excluding the mostly empty f116v) contain circa 10700 vords. This means the average density of 10700/23 = 470 vords per page. Let's even assume a 30% margin for breaking text into columns, leaving more free space or increasing the letter size for better legibility etc. This yields the density of 470 x 0.7 = 330 vords per page. Assuming this density for the nomenclator space, 16200 tokens will occupy 49 pages or (regarding a "folio" in this case as a two-sided sheet) 49/2 = 25 folios. This is roughly two quires of the size of the present state of Q20 (without the missing folios). In the relative figures, this would be a (234 + 49)/234 = 21% increase in the total "thickness" of the MS.

The figures look significant but not like impossible.

This calculation implicitly assumes roughly equal average token length both for plain text and cipher text, but this would not be something unexpected judging by the average vord length.

Could you link or copy text pertaining to the said theory?
Regards,
Alex

(25-02-2023, 02:08 AM)Addsamuels Wrote: You are not allowed to view links. Register or Login to view.Could you link or copy text pertaining to the said theory?
Regards,
Alex

Here are a couple of links explaining the concept:

You are not allowed to view links. Register or Login to view.

You are not allowed to view links. Register or Login to view.

The scenario which I discussed would be known as a codebook.

I thought when you refer to nomenclator you were referring to the rare characters. In the context of ciphers of the time the rare characters would constitute the nomenclator. I don't know to what extent they are typically ignored in Voynich analysis.

(25-02-2023, 12:17 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.In the context of ciphers of the time the rare characters would constitute the nomenclator.

Indeed, but that typical approach would not involve a nomenclator of large size. I wanted to estimate the size of a nomenclator with a one-to-one codebook approach. That is what I meant by the "worse case" (in terms of size).

The idea behind all this reasoning is that if we do not consider the VMS text in the view of some You are not allowed to view links. Register or Login to view. technique and assume the one-to-one mapping of vords to words, then the vords look very much like artificial constructs completely decoupled from the plaint text word morphology. Which immediately suggests a codebook nomenclator.

I would suspect - although I can't be bothered to prove this at the moment Big Grin

- that vords would be subdivided into codeblocks. So one vord could actually be a codegroup.

I spent some time a while ago wondering if vords were actually similar to Hieroglyphic cartouches, ie, syllabic groupings. This would go some way towards explaining away both the repetition and strong positional element of glyphs within vords.

One step further would be to look at the vocabularies of Currier A vs Currier B and on the intersection of those. Because if two groups of people work on two documents, they well might have two different nomenclators. Or they might start with a common "base" nomenclator and then fill it up, as appropriate, with new words absent from the base nomenclator.

Quote from voynich.nu:

Quote:In general, words tend to either occur in both languages or they occur in 'B' only. There is a very short list of moderate-frequency words that occur only in language A.

This is my second favorite idea,
it's just that i get the feeling that there are not enough vords.
If you go by the $I imagery marker in the ZL transliteration then each section indeed has its own vocabulary but those vocabularies are a bit small.
Like in the Pharmaceutical section i would expect more specialiist words for the ingredients and processing methods that kind of thing.

Also the apparent intra-vord structure would need some kind of explanation

(25-02-2023, 11:46 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Also the apparent intra-vord structure would need some kind of explanation

The intra-vord structure may serve the purposes of locating vords in a large nomenclator. If one produces the plaintext-to-Voynich dictionary (which serves the encoding purpose) as he encodes the plain text, and uses let's say the alphabetic principle of sorting the plaintext words in that dictionary, then he also must provide for some method which would allow for convenient decoding of the ciphertext afterwards - that is, a method of quick lookup of vords in that (huge) dictionary.

When we translate e.g. between English and German, we use two dictionaries: the English-to-German dictionary and the German-to-English dictionary. In both of them words of the source language are sorted alphabetically and thus can be found quickly. In the hypothetical Voynich codebook only one of the sets could have been sorted alphabetically (or in any other convenient way) - either the underlay vocabulary or the Voynichese, and I suspect that it would rather be the underlay language vocabulary, because when you encode every possible word, then you don't know in advance which words you will need, and then you are bound to composing your underlay-to-overlay dictionary just as you write the text, not prior to that. Then for the reverse task of decoding (when you decide to read the ciphertext that you've written), how would you approach it without some means of quickly finding each Voynichese word in your codebook?

(25-02-2023, 12:36 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.The figures look significant but not like impossible.

What's impossible is a one-to-one word mapping by codebook, because it would preserve some of the statistical weirdness (repetitions, low correlation of word pairs), and one-to-many would increase the size of the codebook unnecessarily.

Pages: 1 2

Anton

Addsamuels

Anton

Mark Knowles

Anton

davidjackson

Anton

RobGea

Anton

nablator