The Voynich Ninja
[Article] "The Strange Quest to Crack the Voynich Code" - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: News (https://www.voynich.ninja/forum-25.html)
+--- Thread: [Article] "The Strange Quest to Crack the Voynich Code" (/thread-3098.html)

Pages: 1 2 3 4 5


RE: "The Strange Quest to Crack the Voynich Code" - DONJCH - 14-02-2020

(13-02-2020, 09:39 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.And I propose that we call an end to this little sidetrack discussion for now.

And I agree.

Who would have thought the Ninjas could be so easily distracted by a little throw - away comment? Big Grin


RE: "The Strange Quest to Crack the Voynich Code" - Anton - 14-02-2020

Just kidding. Nevermind!


RE: "The Strange Quest to Crack the Voynich Code" - MarcoP - 15-02-2020

I think that any summary of current Voynich research should mention Rene's work, while only his web-site is linked. Apart from this, the article looks like a good introduction to the hobby.
I approve the focus on peer-reviewed publications: as the article says, the Cheshire incident tells us that one cannot blindly trust peer-review, but the formal process of publishing a paper helps making it more informative.
It is great to see that Lisa Fagin Davis is mentioned as a paladin of good research, as well as that Timm and Schinner's work is receiving the attention it deserves.

I haven't read Alin's paper yet and I think I never read Amancio et al. 2013 - though I don't think I have the skills to completely understand all details, I hope I will find the time to look into both works.


RE: "The Strange Quest to Crack the Voynich Code" - MarcoP - 16-02-2020

There is You are not allowed to view links. Register or Login to view. about Amancio et al. 2013, but it appears to be closed.

Since I have a growing interest in the problem of telling meaningful from meaningless texts, I devoted some time to their paper Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript. A first disappointing thing is that the authors have used an EVA transliteration of the VMS, but they mention neither EVA nor the author of the transliteration.

As I expected, I don't understand much of the technical content, which appears to be quite complex. The problem may be my lack of competence, but I am not sure that their approach is sound. I found these passages particularly perplexing:

Quote:The various hypotheses about the VMS can be summarized into three categories: (i) A sequence of words without a meaningful message; (ii) a meaningful text written originally in an existing language which was coded (and possibly encrypted with a mono-alphabetic cipher) in the Voynich alphabet; and (iii) a meaningful text written in an unknown (possibly constructed) language. While it is impossible to investigate systematically all these hypotheses, here we perform a number of statistical analyses which aim at clarifying the feasibility of each of these scenarios. To address point (i) we analyze shuffled texts.

...

The values of X [any of several statistical measures] for the VMS ... in Table 3 indicate that the VMS is not compatible with shuffled texts

...

Table 3 shows the largest distances [between the VMS and its shuffled versions] for intermittency (I and I*) and network measurements (k and L*). Because intermittency is strongly affected by stylistic/semantic aspects and network measurements are mainly influenced by syntactic factors, we take these results to mean that the VMS is not compatible with shuffled, meaningless texts.

If I understand correctly, they say that you can discriminate meaningful texts by checking if they differ from texts produced by randomly shuffling the same words.

Currier pointed out at least three features of the VMS that make it different from re-shuffled versions of the same words:
  • "Languages" A and B are Statistically Distinct: some words that are frequent in B never occur in A
  • The Line Is a Functional Entity: some words tend to appear at the beginning or end of lines
  • Effect of ‘‘Word’’-Final Symbols on the Initial Symbol of the Following ‘‘Word’’: for instance, about 70% of words starting with q- appear after a word ending with -y

A shuffled text will obviously be very different, with no trace of those three properties:
  • A and B can no more be distinguished, since B words will be scattered through the whole text.
  • Words that only appear line-initially/finally will be moved to different positions, sometimes appearing next to each other.
  • q-words will have no preference to follow y- but will uniformly appear after any suffix. This of course will hold for all end-start preferences.

While the properties pointed out by Currier prove that the word order of Voynichese is not random, I am not sure they are necessarily suggestive of meaningfulness. Word-boundary effects could be explained phonetically (as proposed by Emma May Smith), but this could still be some kind of glossolalia (i.e. phonetical but not meaningful). Also, one can imagine different explanations, like the "visual harmony" proposed by Schwerdtfeger and accepted by Timm and Schinner. As far as I know, the other two properties are unparalleld in written texts (at least in prose, poetry may show some kind of line-effects).

As an example of an ancient non-random but meaningless document, one can consider the tables in the Book of Soyga (successfully analysed by You are not allowed to view links. Register or Login to view. in 1996). The algorithm according to which the tables were generated is totally deterministic; the only information that the tables add to the (largely meaningless?) heading words on which they are based is due to errors by the scribes who wrote and copied the text.


RE: "The Strange Quest to Crack the Voynich Code" - Torsten - 16-02-2020

The article at undark.org is constructing an antitheses to Schinners results by arguing: "A team of scientists in Brazil and Germany in 2013 ran their own statistical analyses and drew the opposite conclusion: The text was likely written in a language, and not randomly generated." But this doesn't mean that Schinner and Amancio et al. got different statistical results or that Amancio et al. are arguing against Schinners paper.

Schinner concluded in 2007 that the text "has been created using 'algorithmic' methods, implicitly or explicitly involving some degree of randomness." (Schinner 2007, p. 106). Schinner also wrote: "However, the VMS text obviously is not composed of simple random strings, and it shows rich linguistic-like structure" (Schinner 2007, p. 96). The paper of Amancio on the other side didn't say anything about a text generation method. Their paper only concludes that the text "differs from a random sequence of words" and is "compatible with natural languages" (Amancio et al. 2013). In some way both papers are arguing that the text is not composed of random strings and shows some linguistic-like structure. The only difference is that Schinner interpreted the statistical results as evidence for a stochastic process whereas Amancio et al. interpreted it as evidence for the language hypotheses.

In 2013 also a paper from Montemurro and Zanette was published: "You are not allowed to view links. Register or Login to view.". The paper uses a similar approach and comes to a similar result as the Amancio-paper. The paper of Montemurro and Zanette had more impact and is therefore better known. It was also discussed by linguists like You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view.. For instance Chrisomalis criticizes: "simply because the VM has some structure, even one that resembles language in some ways, does not entail that it is likely to have a genuine linguistic structure". Chrisomalis concludes: "So in essence, Montemurro and Zanette seem to be suggesting that the VM has properties similar to no writing system ever known to have been used on earth, because they do not seem to know what sorts of writing systems they are comparing things to" (Chrisomalis 2013).

The paper of Montemurro would better fit into the undark.org article. At least it argues against a simple text generation method: "In summary, simple methods to generate random texts with some sort of local statistical structure may seem, under superficial scrutiny, rather convincing solutions to the problem presented by the Voynich manuscript. However, the statistical structure of the text at its various levels still requires an explanation that needs to go beyond reproducing local features like word forms or local word sequences. Here, we have contributed evidence of non-trivial statistical structure in the long-range use of words in the Voynich text." (Montemurro & Zanette 2013).

The reason that the undark.org article is referencing the Amancio-paper might be an article on VICE.com from 2014: "You are not allowed to view links. Register or Login to view.". The article is about the Amancio-paper from 2013 and like the article at undark.org the article from 2014 is also very enthusiastic about statistical tests and the use of computers.


RE: "The Strange Quest to Crack the Voynich Code" - Alin_J - 17-02-2020

(16-02-2020, 05:16 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.If I understand correctly, they say that you can discriminate meaningful texts by checking if they differ from texts produced by randomly shuffling the same words.

This is basically what they have done, showing that the Voynich manuscript differs from random word-shuffled texts and are compatible with natural texts by the correspondence of the text to almost all of the properties they measured in the natural texts that were shown to be reliable indicators. I have some questions about their methodology though. As quoted in the paper:

Quote:The compatibility with natural texts was computed using Eq. (1), where P was computed adding Gaussian distributions centered around each X observed in the New Testament over different languages L. The standard deviation on each Gaussian representing a book in the test dataset should be proportional to the variation of X across different texts and therefore we used the least sigma between English and Portuguese.

Did they assume that their variances (sigmas) in the other languages were the same as for the two languages they had measured many texts of - English and Portuguese? Furthermore, they chose the variance that was smallest (of the English cmp. to Portuguese texts)? Am I understanding correctly?

Then, as I understand, the compatibility was calculated from integration of the upper/lower tails from the measured value, of the interpolated distribution resulting from the use of these sigmas? If these integrals were < 0.05 it is not compatible since it is too far off a reasonable probability. 

I don't know about you, but to me this seems a bit bold... assuming the same sigma from the lower of only two measurements (languages). Not that I think that this would affect much the overall conclusions of the study though.


RE: "The Strange Quest to Crack the Voynich Code" - Torsten - 17-02-2020

Hi Jonas!

I wonder if the summary given with the reference to your paper is correct in your eyes?

The undark.org article states that a "statistical paper published in November described how visual analysis of the letters identified patterns in the script itself that seem similar to other written alphabets." As far as I can see this doesn't seem to fit with your results. You explicitly wrote: "This study neither intends to support that the Voynich manuscript is a hoax or that it is a meaningful text" (Alin 2019, p. 2). In fact, you are also arguing that the PCA analyses was "not completely successful in classifying the characters into vowel- and consonant groups, or the script either does not contain vowels (abjad-script), or has vowels and consonants arranged in some other fashion due to a transposition encryption scheme" (Alin 2019, p. 13). 

Further you describe the following observation "the more similar the characters look, the more similar would be their pattern of transition probabilities to other characters" (Alin 2019, p. 16). This way your paper does in fact confirm our description of the VMs (see Timm & Schinner 2019, p. 2f or Timm 2014, p. 4f) and also confirms one of the three modification rules we describe in our paper: select a source word and modify it, by replacing "one or more glyphs by similar ones" (Timm & Schinner 2019, p. 9).


RE: "The Strange Quest to Crack the Voynich Code" - Alin_J - 17-02-2020

(17-02-2020, 08:22 PM)Torsten Wrote: You are not allowed to view links. Register or Login to view.Hi Jonas!

I wonder if the summary given with the reference to your paper is correct in your eyes?

The undark.org article states that a "statistical paper published in November described how visual analysis of the letters identified patterns in the script itself that seem similar to other written alphabets." As far as I can see this doesn't seem to fit with your results. You explicitly wrote: "This study neither intends to support that the Voynich manuscript is a hoax or that it is a meaningful text" (Alin 2019, p. 2). In fact, you are also arguing that the PCA analyses was "not completely successful in classifying the characters into vowel- and consonant groups, or the script either does not contain vowels (abjad-script), or has vowels and consonants arranged in some other fashion due to a transposition encryption scheme" (Alin 2019, p. 13). 

Further you describe the following observation "the more similar the characters look, the more similar would be their pattern of transition probabilities to other characters" (Alin 2019, p. 16). This way your paper does in fact confirm our description of the VMs (see Timm & Schinner 2019, p. 2f or Timm 2014, p. 4f) and also confirms one of the three modification rules we describe in our paper: select a source word and modify it, by replacing "one or more glyphs by similar ones" (Timm & Schinner 2019, p. 9).

Yes, and I draw no conclusions towards either of the meaningful- or meaningless text hypotheses in there myself. I leave the question open for various interpretations of the results so to speak. It seems that at least one has chosen to interpret the results this way, and there might be reasons for this, and who am I to argue without the full reasoning. It might be in the way of observing for example the division between "vowels" and "consonants" resembling those in natural languages, with characters in one of the sets actually resembling a/o/i/e in our alphabets etc. This is all however very subjective.

Personally I have no other interest than the truth, whatever that might be.


RE: "The Strange Quest to Crack the Voynich Code" - nickpelling - 17-02-2020

Any paper claiming to draw conclusions about Voynichese but without really understanding the intention / nature / limitations of EVA can only sensibly be... toilet paper. :-(

And there's a whole lot of Voynich toilet paper out there. :-(


RE: "The Strange Quest to Crack the Voynich Code" - Torsten - 17-02-2020

"The analysis was made on a widely available transliteration of the Voynich manuscript text, the '101' format by Glen Claston" (Alin 2019, p. 5).

"I strongly believe that the biggest problem we face precedes cryptanalysis – in short, we can’t yet parse what we’re seeing well enough to run genuinely useful statistical tests" (You are not allowed to view links. Register or Login to view.).