The Voynich Ninja - Language A vs B crib?

Pages: 1 2 3 4

Here's the idea: First we count the frequency of words that appear in a set of folios written in "Language A", and then we count the frequency of words in a set of folios written in "Language B". We then compare the distributions: a reasonable speculation is that the most common words in A are the same plaintext as the most common words in B.

Here's what you get if you compare the Herbal folios in Language A (1 through 56) with the Recipe folios in Language B (103 through 116):

[Image: wordfrequenices.jpg]

(The table shows the top ten words in the combined folios, then broken out into the Language A and Language B folios.)

This shows that in Language A the most common word is "8am", whereas in Language B it is "am". (Having said that, the frequencies are quite different.) If you buy the hypothesis that they are ciphers for the same plaintext word, then the glyph "8" is likely a null in Language A. Comparing other frequent words in the same way may reveal insight into how the text is enciphered ... For example, the next most frequent word in A is "1oe" whereas in B it is "1c89": does that mean that "oe" in A converts to the same letters as "c89" in B?

I took this idea and ran with it for a while (in the linked posts from my blog below), but made little headway. I'd be most interested to hear critiques of this approach Big Grin

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

Julian

(07-09-2016, 06:52 PM)julian Wrote: You are not allowed to view links. Register or Login to view....
If you buy the hypothesis that they are ciphers for the same plaintext word, then the glyph "8" is likely a null in Language A. ...

That's one possible explanation.

Another possibility is that 8ain is a functional unit and they got tired of writing out the whole thing and realized ain would suffice, in which case the 8 glyph in other words might not be a null.

I have a rather similar table You are not allowed to view links. Register or Login to view. (about two thirds down), with statistics for a few additional 'types' of pages.
It takes a while to digest, perhaps. It uses the Currier transcription alphabet (it was written in pre-Eva times).

I split the recipes folios (f103-116) into two parts, which I called stars-B and stars-Bio based on some statistical considerations which are explained on that page.
What I think it shows is that the relationship between daiin and aiin is probably more complicated.

(07-09-2016, 07:12 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(07-09-2016, 06:52 PM)julian Wrote: You are not allowed to view links. Register or Login to view....
If you buy the hypothesis that they are ciphers for the same plaintext word, then the glyph "8" is likely a null in Language A. ...

That's one possible explanation.

Another possibility is that 8ain is a functional unit and they got tired of writing out the whole thing and realized ain would suffice, in which case the 8 glyph in other words might not be a null.

But then would you not expect the B words to be generally shorter than the A words, if the scribe was tired?!

Isn't it just a possibility that the source text for both sections was in a different language, dialect, register or just vocabulary set? Even if it wasn't a standard prose text but rather a set of data of some kind, this is still possible.

(07-09-2016, 07:19 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.I have a rather similar table You are not allowed to view links. Register or Login to view. (about two thirds down), with statistics for a few additional 'types' of pages.
It takes a while to digest, perhaps. It uses the Currier transcription alphabet (it was written in pre-Eva times).

I split the recipes folios (f103-116) into two parts, which I called stars-B and stars-Bio based on some statistical considerations which are explained on that page.
What I think it shows is that the relationship between daiin and aiin is probably more complicated.

Thanks, Rene - I had looked at this page in the past - very useful.

What further motivated me with this was to try to find a mapping between ngrams of the B glyphs and ngrams of the A glyphs to see if I could convert A words to B words while preserving the frequency distributions. The best fit showed that the non-gallows glyphs were interchangeable, but the gallows weren't, which was odd.

(07-09-2016, 08:10 PM)Koen Gh. Wrote: You are not allowed to view links. Register or Login to view.Isn't it just a possibility that the source text for both sections was in a different language, dialect, register or just vocabulary set? Even if it wasn't a standard prose text but rather a set of data of some kind, this is still possible.

Yes, absolutely it assumes prose. The likelihood of the plaintext being in two or more different languages seems small to me, but who knows. If Language A is plaintext German, and Language B is plaintext Latin, then the enciphered most frequent words would be expected to look very different. But, in the case of the most frequent "8am" and "am", they don't.

(07-09-2016, 08:08 PM)julian Wrote: You are not allowed to view links. Register or Login to view.
(07-09-2016, 07:12 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(07-09-2016, 06:52 PM)julian Wrote: You are not allowed to view links. Register or Login to view....
If you buy the hypothesis that they are ciphers for the same plaintext word, then the glyph "8" is likely a null in Language A. ...

That's one possible explanation.

Another possibility is that 8ain is a functional unit and they got tired of writing out the whole thing and realized ain would suffice, in which case the 8 glyph in other words might not be a null.

But then would you not expect the B words to be generally shorter than the A words, if the scribe was tired?!

No, not necessarily. They are two different things. Noticing that something will suffice, will do the job just as well, is not the same as shortening it because the scribe was tired. I probably shouldn't have combined "suffice" and "tired" in the same post but I guess I did so because both are possible explanations.

When you first set up a writing system (whether it's a cipher system or a phonetic system), you may think you have done it effectively or that you have covered all the contingencies, but in actual use situations are likely to come up that may not have been covered. Similarly, situations might come up where you realize you overdid it and can get as much bang out of a smaller buck.

I'm not asserting that's the reason it was shortened, I am simply noting that there are other possibilities than the one you mentioned.

I'm not at all surprised there are anomalies (uncommon sequences and letters) in the VMS, and a general shift in the "code" (vord frequencies) in other parts of the document. Not only does the content appear to change in different parts of the document (which would affect vord relationships and frequency), but to get a unique writing system perfect the first time is highly unlikely.

Quote:We then compare the distributions: a reasonable speculation is that the most common words in A are the same plaintext as the most common words in B.

That's not that reasonable if you compare text blocks the subject of which is different - like Herbal A with Recipe B. A pure test would be to compare herbal A with herbal B.

In a highly abbreviated "telegraph-style" text (which the VMS probably is), function words may be absent at all.

(07-09-2016, 09:36 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:We then compare the distributions: a reasonable speculation is that the most common words in A are the same plaintext as the most common words in B.

That's not that reasonable if you compare text blocks the subject of which is different - like Herbal A with Recipe B. A pure test would be to compare herbal A with herbal B.

In a highly abbreviated "telegraph-style" text (which the VMS probably is), function words may be absent at all.

I don't think there is a Herbal B, otherwise I'd have used that. My notes say Herbal folios are 1-56, 87, 90, and 93 to 96 which doesn't overlap with Language B. Do I have that wrong?

I agree that the inquiry is only meaningful if the plaintext is not odd (like highly abbreviated). For normal plaintext one would expect the most common words to be the same even if the subject matter was different, right?

The fact that the most common words from the Herbal A and Recipes B look similar, I like to think is more than coincidence Wink

Quote:Do I have that wrong?

Seems so. There definitely are herbal B folios there. A quick look at Rene's site shows that e.g. 26r and 26v are in B. Rene provides a note for each folio whether it is A or B.

Pages: 1 2 3 4