Lindemann and Bowern have updated their paper about "Character Entropy" (see You are not allowed to view links.
Register or
Login to view.):
Quote:For this update we developed an improved method for extracting language text from Wikipedia, removing metadata and wikicode, and we have rebuilt our corpus based on current wikipedia dumps.
Their results for the Wikipedia corpus is now more plausible. As expected Hawaiian has the lowest h2 value (see p. 28) and there is also less overlapping for different script types (see figure 11 on p. 20). Instead languages using the same type of script now tend to build clusters.
Some of the results differ dramatically. See for instance the results for languages like Wu and Zhuang. But also for languages like english the results have changed. The character set size for english is now 27 instead of 28 and the h2 value is 3.525 instead of 3.448. For some reason the general trend is that the h2 values are now lower for the Wikipedia corpus. Until today the corpus material was not updated (see You are not allowed to view links.
Register or
Login to view.). Therefore it is not possible to check the new results.
Also the values for the Voynich manuscript have changed. The authors explain this with a "minor alteration to the Maximal Voynich transcription system".
There are still some mistakes. For instance it would be expected that the h2 value for labels should be higher than for paragraphs. However the table on p. 38 shows that this is not the case for Hand 5. For Hand 5 Lindemann counts 2111 labels whereas I only count 15 on folio f66r. The reason for this result is probably the way the authors interpret the interlinear transcription file. The file is using markers like 'P' for paragraph and 'L' for labels. It seems as if the authors also interpreted markers like 'R' for right column as marker for labels.
They also added a short response to the review I and Andreas Schinner have published at Cryptologia (see Timm & Schinner 2021, You are not allowed to view links.
Register or
Login to view.). However they only reiterate their conclusion: "Voynichese appears unnatural only below the word level. At the level of page and paragraph, Voynichese is comparable to natural language and structured text" (Lindemann & Bowern 2021).
It is more than easy to point to non language like features for the word level and above. In our review we do point to some of them (see Timm & Schinner 2021). The most obvious feature is the existence of Currier A and B and the permanent shift from Currier A to Currier B (see Timm & Schinner 2020, p. 6 You are not allowed to view links.
Register or
Login to view.).
Moreover, our argumentation is literally that the Voynich text is more structured than natural language: "the level of context dependency is on a higher level than expected for a linguistic system" (Timm 2016, p. 7 You are not allowed to view links.
Register or
Login to view.).
The Voynich text is clearly structured. But simply because the text has some structure does not mean that it is likely to have a genuine linguistic structure. There are for instance repetitive text fragments like "shol chol shoky okol sho chol chol chal shol chol chol shol" on folio 42r or "qokeedy qokeedy qokedy qokedy qokeedy" on folio 75r. But this doesn't mean that this type of artificial text fragments must represent some type of linguistic structure. See for instance the decorative pseudo texts as described in "Writing that isn’t. Pseudo-scripts in comparative view" (Houston, S. 2018, p. 21-48. You are not allowed to view links.
Register or
Login to view. or You are not allowed to view links.
Register or
Login to view.).