18-03-2019, 07:23 AM
18-03-2019, 07:23 AM
18-03-2019, 02:22 PM
Is there a measure of similarity which can be used to compare generated texts to the real text? If so, can we establish a minimum sets of generative rules to create maximum similarity?
To me, that would help to demonstrate whether a procedurally-generated text has any chance of being the correct solution. If it's not possible to approach 90%+ similarity without a large and complex ruleset then we must discard it as a theory.
I suppose this idea is complicated by the fact that we can start from an arbitrary base, such as parts of words created by a human, and them only devise a set of rules to combine them. Even so, we could incorporate some measure of "starting position" to balance out the size of the ruleset.
To me, that would help to demonstrate whether a procedurally-generated text has any chance of being the correct solution. If it's not possible to approach 90%+ similarity without a large and complex ruleset then we must discard it as a theory.
I suppose this idea is complicated by the fact that we can start from an arbitrary base, such as parts of words created by a human, and them only devise a set of rules to combine them. Even so, we could incorporate some measure of "starting position" to balance out the size of the ruleset.
18-03-2019, 03:20 PM
That's something that might take some thought.
Don's solution is going to get high scores because he is using full VMS tokens to represent characters (or character pairs). They match before he has even generated a sample of text. But do you think that each "word" in the VMS represents only one or two letters?
René's sample is character based, which is really a very good result considering how difficult it is to interpret natural language into VMS text. I give him credit for that. It's not a perfect match, but if you think the VMS characters each has a value, rather than a whole word representing only a tiny amount of text then it's possible his lower-scoring text is actually closer to the real solution.
I did something different from both Don and René. My sample is partly character-based, partly based on some patterns in the VMS that I think might be generated in a different way. My score, if I tweaked my sample a little, might be in between that of René's and Don's, but if you have any suspicions that some of the VMS glyphs might be ligatures abbreviations, biglyphs or anything that might not be 100% character-based, then which is more important... a high score or fidelity to the original VMS method (if that can be determined)?
Don's solution is going to get high scores because he is using full VMS tokens to represent characters (or character pairs). They match before he has even generated a sample of text. But do you think that each "word" in the VMS represents only one or two letters?
René's sample is character based, which is really a very good result considering how difficult it is to interpret natural language into VMS text. I give him credit for that. It's not a perfect match, but if you think the VMS characters each has a value, rather than a whole word representing only a tiny amount of text then it's possible his lower-scoring text is actually closer to the real solution.
I did something different from both Don and René. My sample is partly character-based, partly based on some patterns in the VMS that I think might be generated in a different way. My score, if I tweaked my sample a little, might be in between that of René's and Don's, but if you have any suspicions that some of the VMS glyphs might be ligatures abbreviations, biglyphs or anything that might not be 100% character-based, then which is more important... a high score or fidelity to the original VMS method (if that can be determined)?
18-03-2019, 03:42 PM
(15-03-2019, 07:49 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Mostly for fun, but also to improve my understanding of the text, I occasionally play around with methods to generate text that looks like the Voynich MS text.Can you tell me from which page this text comming? It is very interesting.
Just to show an example , the following is a very straightforward 'encoding' of a short piece of Italian.
This can still be tuned a lot, and I'll refrain for the moment from explaining how it was done.
However, it can be inverted exactly, i.e. a very simply process will turn this back into legible Italian (though spaces
are lost).
First in Eva, then in Voynichese:
Quote:oty chey shaiin cheaiin dokaiin ar shy qotsheshaiin dsol chcheol dsar dy
ol chsy dol cthaiin daiin char shchey dy otokchar aiin sain ckhaiin
ckheeaiin sheaiin chcheain okar ototaiin ar qokaiin chesheol shy seaiin dckhear dy
sy cthaiin dokaiin y dcthaiin dokaiin aiin sain dol y dar shokchol qotar dcthain aiin
ol shokchaiin shar sy ctheaiin doty ol dsy qotaiin ddy ain chear dy
oty chey Shaiin cheaiin dokaiin ar Shy qotSheShaiin dsol chcheol dsar dy
ol chsy dol cThaiin daiin char Shchey dy otokchar aiin sain cKhaiin
cKheeaiin Sheaiin chcheain okar ototaiin ar qokaiin cheSheol Shy seaiin dcKhear dy
sy cThaiin dokaiin y dcThaiin dokaiin aiin sain dol y dar Shokchol qotar dcThain aiin
ol Shokchaiin Shar sy cTheaiin doty ol dsy qotaiin ddy ain chear dy
Remarkably, the entropy of this text (skipping spaces) is:
1st order: 3.7705
conditional: 1.2110
I have clearly 'overdone' it with the conditional entropy.
I am aware that this is not exactly like Voynichese. Maybe I can do better in the next weeks.
Of course, everyone is invited to present similar experiments,
Text said"You got a"free ticket" to enjoy my knowledge...you doubt...thinking about your lord...let it go....i will pour in ...
Thanks!
18-03-2019, 03:54 PM
Aldis, it's not Voynichese, it's a classical text enciphered in the style of the VMS.
You can tell right away that it's not Voynichese by looking at the c-shape patterns. It is, however, quite close, considering it's a translation of natural language into Voynichese. René did a good job.
You can tell right away that it's not Voynichese by looking at the c-shape patterns. It is, however, quite close, considering it's a translation of natural language into Voynichese. René did a good job.
18-03-2019, 04:51 PM
Basically it is a verbose substitution, with spaces re-arranged. This allows to decrease the entropy and generate the word patterns. The resulting text is longer than the Italian source by a factor between 1.5 and 2 - I did not check.
W.r.t. Emma's question, one could think of an array of indicators, based on similarity of:
- entropy
- word length distribution
- word frequency distribution (a.k.a. Zipf's law).
- others
Donald's text would have a problem only with the third one.
These bullets, however, do not give any clue whether the generated text would be 'meaningful'.
To do that, more thought is needed.
W.r.t. Emma's question, one could think of an array of indicators, based on similarity of:
- entropy
- word length distribution
- word frequency distribution (a.k.a. Zipf's law).
- others
Donald's text would have a problem only with the third one.
These bullets, however, do not give any clue whether the generated text would be 'meaningful'.
To do that, more thought is needed.
18-03-2019, 05:33 PM
(17-03-2019, 12:35 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.In a few minutes , I was able to generate this: "qodom qodeey otaiin olk otaiin shodeey okeey otaiin qodom qodeey otaiin otaiin ykain shotol qotol otaiin ydaim aiin ldor qotar qokaiin okos chokeey qokaiin aiin ldor qotar qokaiin otary ty odol qokaiin shodain tal okaiin olk otaiin chokain shotol qotol dam qokar qokaiin shodeey okeey otaiin aiin shodain qotain dain dal aiir chokair qold otey ldy dain daiir odol qotey otaiir okeey otaiin otaiin tal otar ydol tor qodol dam qokar qokaiin qokaiin al oiin oteey okal otaiin yky das chokor chodol dol chodaiin shodor qotaly choky doiin d qokal qokaiin shodol olkar qokal aiin qoty dam otor dam qodor qodain d chokar olkain deey ary tain daiir odol qotey otaiir aiin qokaiin aiin shodain qotain dar chokain ytol".
I'll tell you how I did it. Each pair of plaintext letters was encoded into a single Voynichese word. The first letter was the prefix, the next letter was the suffix. Spaces between words in the plaintext are significant, and diacritics were removed before encryption. Plaintext letters, and Voynichese prefixes and suffixes, were ordered roughly in descending order of frequency. So, it's a verbose cipher. It might be too short to be breakable, but it's the beginning of a well-known text, roughly contemporary with the Voynich Manuscript, in a well-known European language.
Ok, I'm just going to throw this out there: I think Donald's text has got to be susceptible to decryption with the right cryptographic methods -- especially because he has generously given us a *lot* of clues here.
For example, the ciphertext begins "qod-om qod-eey ot-aiin" and then the 9th to 12th words are "qod-om qod-eey ot-aiin ot-aiin". Furthermore, in between these repeated words, we have "ot-aiin" again as the 5th word *and* the 8th word (!), plus we have the "-eey" suffix letter occurring two more times, in the 6th word "shod-eey" and the 7th word "ok-eey". Clearly we have here a *lot* of repetition of letters in just the first 12 ciphertext words, which only represent the first 23 letters of plaintext (since the 4th word "olk-" appears to be a prefix without a suffix).
Expressing this verbose cipher as a more normal cipher, the pattern is AB-AC-DE-F DE-GC-HC-DE AB-AC-DE-DE .
Plus Donald has told us it is the beginning of a well-known text in a well-known European language in the time period of the Voynich ms.
Surely this is more than enough information to decrypt the text. I'm a bit frustrated that I haven't cracked it yet, but I'm certain that a more experienced cryptographer could do so.
AB-A at the beginning is a huge clue, eliminating most candidate texts. AB-AC-DE-DE as the 16th to 23rd letters are an even bigger clue, with all of the repetition of letters, along with the repetition of the first 7 letters of the text.
Another clue: the 2nd letter "-om" ("B" in my version) does not occur anywhere else in Donald's ciphertext except in these two places. So it is probably a rather infrequent letter.
Candidate words/phrases for the beginning include "nun...", "sus(ter?)", "ex e...", "ab a...", each of which have rather infrequent 2nd letters. "non ..." would be a good candidate, except that "o" is not likely to only occur as a "suffix" in these two places in the text. But it is possible.
Surely someone can break this cipher with all of these patterns and clues...
18-03-2019, 06:18 PM
I'm playing with it, so I hope that Don doesn't publish a solution until somebody gets it right.
18-03-2019, 09:55 PM
(18-03-2019, 06:18 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm playing with it, so I hope that Don doesn't publish a solution until somebody gets it right.
I will be very impressed by whomever manages to decrypt it and identify the source, because I have looked up every single significant text of the 14th and 15th centuries that I can think of or identify, and I haven't found a single one that begins with a letter pattern that would match Donald's ciphertext. I'm sure I'm missing something, obviously, but this is a pretty interesting challenge.
18-03-2019, 10:49 PM
(18-03-2019, 09:55 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.(18-03-2019, 06:18 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.I'm playing with it, so I hope that Don doesn't publish a solution until somebody gets it right.
I will be very impressed by whomever manages to decrypt it and identify the source, because I have looked up every single significant text of the 14th and 15th centuries that I can think of or identify, and I haven't found a single one that begins with a letter pattern that would match Donald's ciphertext. I'm sure I'm missing something, obviously, but this is a pretty interesting challenge.
There's an error in your reasoning, and when I wrote "roughly contemporary" I meant plus or minus a century.
This is an interesting thread, as we're looking at the problem from the other end: instead of trying to decrypt the text, we're looking at ways of encrypting texts, i.e. putting ourselves in the shoes of whoever created the Voynich Manuscript.