Schinner's random-walk argument (see You are not allowed to view links.
Register or
Login to view.) is also prominent in the more recent paper by You are not allowed to view links.
Register or
Login to view. (p.11-p.14, Fig.8).
Timm&Schinnner' Wrote:An enigmatic property of the VMS reported in [12] is the presence of long-range correlations visible on the bit level of the text. They let the glyph sequences appear as a stochastic process with underlying Pólya-like distribution, rather than natural language.
I decided to try and look into it, but the results were mixed. I must say I find this measure quite complex and difficult to interpret and I am still unsure of what it means.
The process basically is:
- the text is mapped into a long binary sequence; this is done by ignoring spaces and mapping each remaining character to a different binary pattern (e.g. 00000, 00001, 00010 etc.)
- the binary sequence is converted into a path in a bi dimensional space: you start from 0,0 increasing X by 1 at each step; each 0 moves Y down (-1) and each 1 moves Y up (+1)
- for each integer L, the differences in the Y values between points on the path with a distance L on the X axis is computed. For each L, you get a long list of numbers and compute its root mean square fluctuation F(L)
- L and F(L) are plotted on a logarithmic chart
- a value "alpha" corresponding to the slope of F(L) is computed
Schinner 2007 quotes Kokol et al. (1999) You are not allowed to view links.
Register or
Login to view..
Figure 2 gives a good summary of the process.
Schinner finds an alpha value of 0.846 for a Voynich EVA transliteration. Apparently, from this he infers that Voynichese cannot be a written natural language, but must be the result of a "stochastic process". He also observes that the slope appears to change for L~360, linking this to the length of a line of text in the VMS.
Schinner Wrote:Previous investigations by Kokol et al. [8] of various human writings have demonstrated that for natural language texts (almost independent of the language used) the asymptotic exponent alpha of F(l) does not notably differ from 0.5... Most interestingly, the VMS text shows completely different behavior: a crossover point exists where the ‘‘random process’’ alpha~0.5 turns into an asymptotic exponent alpha~0.85, indicating the presence of ‘‘memory effects’’ in the underlying stochastic process. ... the crossover point L~360 (=72 characters x5 bits) of the whole text fits well to the average line length
But the statement attributed to Kokol that
"for natural language texts ... the asymptotic exponent alpha of F(l) does not notably differ from 0.5" is very different from what Kokol wrote.
See Kokol's Table 3:
[
attachment=4798]
Kokol only considered 3 languages and 20 texts for each language. One of these 60 samples resulted in alpha=0.72, which differs more from 0.5 than it does from 0.85 (the alpha value for VMS EVA).
Kokol et al Wrote:We see that the mean α for natural language texts is very near 0.5, but single texts differ from this critical value significantly.
Kokol et al Wrote:The difference in α between different writings can be attributed to various factors like personal preferences, used standards, language, type of the text or the problem being solved, type of the organisation in which the writer (or programmer) works, different syntactic, semantic, pragmatic rules etc.
Schinner 2007 also quotes Schenkel, A., J. Zhang, and Y. Zhang (1993) You are not allowed to view links.
Register or
Login to view., which appears to be the first paper where the application of this method to language analysis was discussed.
Schenkel et al. also examined several different texts and pointed out a bible (which I assume to be in English) that results in alpha=0.87, i.e. more than the value for VMS EVA. Their plot (Fig.1 i) also shows a change in alpha at 100<L<1000, the same behaviour that Schinner observed for the VMS.
It is so unfortunate that the four texts examined by Schinner were three bibles in Latin, German and Chinese, while for English he chose Alice in Wonderland!
[
attachment=4797]
I wrote some python code to try and replicate Schinner's experiments. I am far from sure that it is correct. Anyway, these are the plots I get for:
- VMS EVA Takahashi transliteration
- Vulgate Latin Bible (first 25,000 words, to match Schinner's experiment)
- Alice in Wonderland (whole text, to match Schinner's experiment)
- King James English bible (first 191,000 non-space characters, to match the VMS)
[
attachment=4796]