The Voynich Ninja

Pages: 1 2 3 4 5 6

(12-04-2021, 05:41 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.I don't believe that there is any fundamental law that an encryption technique must increase entropy. It is perfectly possible for a cipher to decrease entropy.

Entropy was not a factor when designing ciphers pre-1500. All cipher makers had was a bunch of tricks to hide (or distract from) different linguistic 'tells'.

(12-04-2021, 05:41 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.It is perfectly possible for a cipher to decrease entropy.

It would be of great interest to have a historical example of this.

(12-04-2021, 01:06 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.All "solutions" I've seen so far are either unextendable (only keywords, no paragraphs) or unrepeatable (one-way cipher with too much freedom). Therefore I believe it is impossible to translate a paragraph in a viable way. At least using the many methods that have been revealed to us so far

I translated the very first paragraph of the book, once.

Quote:It read "copyright under the Berne convention, V. Voynich, London, 1920".

Never got anyone to repeat the translation, so I gave it up as a bad job,

The answer to the title is yes.
You just don't notice it any more because everything is wrong anyway.
You only have to listen.

(12-04-2021, 08:14 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(12-04-2021, 05:41 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.It is perfectly possible for a cipher to decrease entropy.

It would be of great interest to have a historical example of this.

Morse code is a fine example. The alphabetic representation of English is highly redundant at 4 bits/character; Shannon himself demonstrated by experiment that the information content of written English is in the range 0.6-1.3 bits/character. By exploiting the relative frequencies of English letters, Morse code compresses text to approximately 2.5 bits/character.

While this example may lie outside the historical period of interest, it occurred before the advent of analytical information theory. It was just human ingenuity, responding to the pressure of limited communication resources (telegraph time), that drove the development of a more information-efficient scheme. Perhaps earlier instances of economized ciphers could be found in situations where the time, space, or materials available to transmit or store messages were similarly scarce.

(12-04-2021, 01:06 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.I was just thinking that it would require a great amount of creativity to convert any paragraph of Voynichese into something (anything) that makes sense using any repeatable method. This alone would be quite the feat, and I don't think it is possible.
...

The best I managed was about 5 or 6 tokens (in a row) with a repeatable, non-subjective method (it was years ago, I can't remember exactly how many). The same method would work on individual words tokens (a few per folio) but I'm pretty sure this was inevitable, given that the VMS is more than 30,000 tokens.

It's not valid in terms of a solution because it produces gibberish for the great majority of tokens. With so many tokens, you can find individual bits in many languages.

(12-04-2021, 04:36 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Marco: I even forgot to mention the gramaticallity of the output. But I wonder if, as a challenge of sorts, if it would be possible to produce a long word-salad without using an interpretative step.

I fear that even this might be very difficult.

Since the VMS only contains about 8,000 different word types, while languages typically have at least 100,000, defining a function that maps from the VMS to the dictionary of any language should be possible. Bowern and Lindemann (The Linguistics of the Voynich Manuscript) came to this conclusion:

Bowern and Lindemann Wrote:the script is not structure-preserving in that the graphemes
are not one to one, but they do encode words in a regular orthography

In their opinion, the mapping cannot be based on the structure shown by Voynichese glyphs. A drastic example of a system that does not preserve such word-structure is Rene's "mod2" nomenclator. A less drastic case is the "anagrammed Hebrew abjad" proposed by Hauer and Kondrak.
I am not sure that a word-structure-preserving mapping is impossible for something like Chinese or Vietnamese: I don't think that Bowern and Lindemann deeply explored these options.

Anyway, it is important to acknowledge that ancient manuscripts did not encode word salads, but languages with well defined properties.
For instance, one can consider the English You are not allowed to view links. Register or Login to view. (a 1410 ca copy of The Canterbury Tales). An excellent transcription is available You are not allowed to view links. Register or Login to view..

One can compare the most frequent 20 words in the manuscript with the 20 most frequent words in You are not allowed to view links. Register or Login to view.. 15 of the 20 most frequent manuscript words appear in the top 20 modern words, either identically (green) or with minimal variations (upper-case initial, or þ for 'th').

[attachment=5448]

Though my knowledge of English is limited, I find this manuscript very accessible. Of course, getting used to the script requires a little initial effort, but then things work quite well. E.g.

[attachment=5446]

Whan that aprille witħ his schowres swoote
The drougħt of Marche haþ perced to þe roote
And bathud euery veyne in swich licour
Of which vertue engendred is þe flour
Whan zephirus eek with his swete breeth
Enspirud hatħ in euery holte and heetħ
The tendre croppes and þe ȝonge sonne
Hath in þe Ram his halfe cours I ronne

The fact that 'þe' is equivalent to 'the' is quite obvious, since the word is so frequent. Once you understand this, it's easy to see that (like in modern English) the article appears at the start of noun phrases and is typically followed by either a noun or an adjective (þe roote, þe flour, þe ȝonge sonne, þe Ram). Like in modern English, 'his' behaves similarly to 'the' (his schowres swoote, his swete breeth, his halfe cours). Like in modern English, 'and' connects two grammatical structures of the same kind (e.g. sentences or noun phrases). You can basically start from function words, which are almost identical to modern English, and work from there.
I can have trouble with the meaning of words like 'schowres', 'holte' or 'heetħ', but identifying part-of-speech categories is almost always straightforward, so it's easy to keep track of grammar, even when some of the meaning is lost.
The true Voynich translation will allow us to do just that: start from function words, identify basic grammatical structures and finally get to word meanings and translation.

The fantasy that language structure is a modern invention cannot possibly lead to anything interesting.

(12-04-2021, 08:14 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.
(12-04-2021, 05:41 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.It is perfectly possible for a cipher to decrease entropy.

It would be of great interest to have a historical example of this.

I have emailed you.

(13-04-2021, 09:40 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Anyway, it is important to acknowledge that ancient manuscripts did not encode word salads, but languages with well defined properties.
For instance, one can consider the English You are not allowed to view links. Register or Login to view. (a 1410 ca copy of The Canterbury Tales). An excellent transcription is available You are not allowed to view links. Register or Login to view..

One can compare the most frequent 20 words in the manuscript with the 20 most frequent words in You are not allowed to view links. Register or Login to view.. 15 of the 20 most frequent manuscript words appear in the top 20 modern words, either identically (green) or with minimal variations (upper-case initial, or þ for 'th').

Though my knowledge of English is limited, I find this manuscript very accessible. Of course, getting used to the script requires a little initial effort, but then things work quite well. E.g.

Whan that aprille witħ his schowres swoote
The drougħt of Marche haþ perced to þe roote
And bathud euery veyne in swich licour
Of which vertue engendred is þe flour
Whan zephirus eek with his swete breeth
Enspirud hatħ in euery holte and heetħ
The tendre croppes and þe ȝonge sonne
Hath in þe Ram his halfe cours I ronne

The end of the last line you quote is "ironne" or "yronne", rather than "I ronne". The (one) word is the past participle of the verb "irennen", which means "to run" or, of celestial bodies, "to move through the sky". This then makes sense of the whole clause, which actually begins in the middle of the previous line: the subject is "the yonge sonne" and the verb phrase is "Hath...ironne". The object, "his halfe cours", is placed in between the two words of the verb phrase, after the prepositional phrase "in the Ram". In modern English the syntax and word order would rather be "The young sun has run his half-course in the Ram." Middle English, especially poetic Middle English, had rather more flexible word order than we now have in modern English.

Your list of the 20 most frequent words in the Harley MS 7334 includes "I" as 6th on the list, but if it includes such tokens as this "I", which is actually just the first letter of the word "ironne", I imagine that the list may be misleading. After all, if the transcription makes a basic parsing mistake in the 8th line of one of the most famous passages in all of English literature, it does not inspire confidence in the statistical accuracy of the rest of the transcription and a word frequency analysis based on it.

Geoffrey

MarcoP Wrote:The fact that 'þe' is equivalent to 'the' is quite obvious, since the word is so frequent. Once you understand this, it's easy to see that (like in modern English) the article appears at the start of noun phrases and is typically followed by either a noun or an adjective (þe roote, þe flour, þe ȝonge sonne, þe Ram). Like in modern English, 'his' behaves similarly to 'the' (his schowres swoote, his swete breeth, his halfe cours). Like in modern English, 'and' connects two grammatical structures of the same kind (e.g. sentences or noun phrases). You can basically start from function words, which are almost identical to modern English, and work from there.
I can have trouble with the meaning of words like 'schowres', 'holte' or 'heetħ', but identifying part-of-speech categories is almost always straightforward, so it's easy to keep track of grammar, even when some of the meaning is lost.
The true Voynich translation will allow us to do just that: start from function words, identify basic grammatical structures and finally get to word meanings and translation.

The fantasy that language structure is a modern invention cannot possibly lead to anything interesting.

(12-04-2021, 08:14 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.It would be of great interest to have a historical example of this.

I saw two answers to this.

Morse code is of course not a cipher, and modern (not historical in the sense of our problem).

I had a quick look at what Mark sent me by E-mail, but what I saw is not at all evidence of reducing entropy - quite the contrary.

Pages: 1 2 3 4 5 6

nickpelling

ReneZ

davidjackson

Aga Tentakulus

obelus

-JKP-

MarcoP

Mark Knowles

geoffreycaveney

ReneZ