The Voynich Ninja

There's a You are not allowed to view links. Register or Login to view. published by someone called Raj Ponnaluri. I'm guessing he's not on the forum, or he would have posted this himself! He's done a youtube video but I wasn't really able to follow how he comes to his conclusions just from the video.

Wonder if there's a preprint somewhere. The article has obviously passed peer review but some of the categorical statements in the youtube video (e.g. that the article proves it is only one language, and there is no evidence for multiple languages) make me feel a little cautious. I don't happen to think there are multiple languages (assuming there is any language) but the differences in Currier A and B (not to mention Lisa's scribes) would stop me saying that categorically. I feel it's hard for us to say anything so categorical unless/until it is ever deciphered. Will be interesting to see the argument!

The original point for me is how the multiple scribes theory and Currier A + B interact
The part I'm most interested in is the part I'm most lost with though

Paraphrasing
"5 Scribes created a text which contains 2 differing texts, Currier A and B. This is due to them each using their own words. (and not due to different languages)"

Why do we have 2 "languages" and not 5? A way to answer that is to group scribes up somehow. If anything that would be supporting the idea of there being two distinct ways of writing "Voynichese" rather than supporting individual word choices creating differences.

I'd like to read his work so I can see what he has to say properly rather than a quick video, but £45s a bit steep for me..

I'm also sceptical about that part. Given the nature of the difference between A and B, I can't see how to demonstrate that simple vocabulary variation lies at its basis.

It doesn't seem there is a preprint but I've invited Raj to the forum so hopefully we will hear more details about how he reached his conclusions.

(14-11-2024, 03:03 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.It doesn't seem there is a preprint

Attached is a rough transcript of Dr Ponnaluri's video on YouTube.

(14-11-2024, 06:48 PM)dfs346 Wrote: You are not allowed to view links. Register or Login to view.
(14-11-2024, 03:03 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.It doesn't seem there is a preprint

Attached is a rough transcript of Dr Ponnaluri's video on YouTube.

From the transcript:

Quote:Secondly with the aid of Zipf's law, the Brevity and Heaps laws, my work shows that the voynich manuscript contains a natural language.

It might be true that texts in natural languages generally obey Zipf's Law, but other things do too: You are not allowed to view links. Register or Login to view., so the fact that something obey Zipf's Law doesn't prove it's a natural language. My You are not allowed to view links. Register or Login to view. obey's Zipf's Law, and it isn't natural language.

It's good for him that he doesn't start from the scratch like many others (and claims that everything before him was rubbish) but uses existing data as a start.

My feeling is however that he quotes others but doesn't make much new, interesting observation himself.
All this stuff like Zipf's law was done to death before and while it is a valid observation, it doesn't give you right to claim that the text is meaningful or it is a natural language (and not a constructed language for example).

In my opinion, the paper doesn't make much sense. This already starts with the fact that the paper is classified as a review paper, but it remains unclear what exactly the paper is reviewing.

The paper is obviously inspired by the publications of Lisa Fagin Davis and Claire Bowern. It states multiple times that the author agrees with arguments given by Lisa Fagin Davis and Claire Bowern: "Lisa Davis, a paleographer, has built on Currier’s work and concluded that five scribes may have penned the VM (Davis 2020). While it is possible for multiple scribes to commit a hoax, one may question its probability." "Based on their statistical arguments about phonology, morphology, and document structure, Bowern and Lindemann make a case for natural language (Bowern and Lindemann 2020)."

Core of the paper is a statistical analysis based on "the v101 transliteration file, ‘IVTFF v101 2.0M 6’" (see You are not allowed to view links. Register or Login to view.). As reason for choosing the transliteration by Glen Gaston the paper states that "v101 preserve the stroke combinations as single characters in case they appear to be single signs". The transliteration file is using 71 different characters (see Table 6: the v101 basic character set on You are not allowed to view links. Register or Login to view.) and an extended character set (see Table 7: the v101 extended character set You are not allowed to view links. Register or Login to view.). The paper states that "A few manipulations of specific characters helped perform computer-based statistical analyses." and that "a comma, or an uncertain word space, was removed from the file before any analyses were performed."
There is no discussion of the fact that most of the characters are rarely used. There is also no discussion of the fact that the number of more than 71 different characters are far too many for an alphabet.

[attachment=9429]
The paper states Table 2 "shows the results; it may be noted that the top-ranking word type is ‘am’, and it was ranked 6th, 3rd, 1st, 3rd, and 7th by Scribes 1 through 5, respectively."
However the word 'am' or EVA-aiin is not the most common word in the v101 transliteration file. The most common word is 8am or EVA-daiin. The given ranking for the most common words (EVA aiin, daiin, ol, ar, or, s, chol, chey, al, chedy) doesn't fit with the expected ranking (daiin, ol, chedy, aiin, ...) and suggests that maybe uncertain word spaces were read as spaces. The reading of uncertain spaces as spaces would also explain the unusual high word counts given in table 1. However the paper states that uncertain spaces were removed from the transliteration file.

[attachment=9428]
About scribe 5 the paper states: "It is possible that the missing folios may demonstrate the even work contribution theory among the first 4 scribes, in terms of their time and effort. It is also possible that Scribe 5 may be a last-minute add-on or may have completed the remaining work of Scribe 1 or 2, possibly due to one of them passing away." [Ponnaluri, p. 4].

It seems that the author was misunderstanding Curriers use of the word "language". Currier used the term "language" too distinguish between "two different series of agglomerations of symbols or letters" [Currier 1976]. Currier wrote "Now, I m stretching a point a bit, I m aware; my use of the word language is convenient, but it does not have the same connotations as it would have in normal use. Still, it is a convenient word, and I see no reason not to continue using it." [Currier 1976] (see You are not allowed to view links. Register or Login to view.).

About Currier A and B the paper states:
- "It may be noted that all five scribes used six or more word types that featured as the top ten in the entire VM, thus indicating that the Scribes were using similar words and therefore by implication, drawing from the same language."
- "Scribes 1, 2, and 3 may have composed the text in their own style and diction, yet they drew from the same vernacular or “language.” Could this be the justification that VM does not contain two languages or dialects but merely varying word types, providing an illusion that there may be visible distinctions."
- "tests on Scribes 1–3, who composed almost 90% of the entire VM, showed the absence of statistical differences in relative frequencies which suggests that the language must be common among the Scribes, though the word choice, at a micro-level, may have been different. The latter is to be expected among readers and writers of the same language, who all draw from the same knowledge source, yet make very different word choices when they communicate. In other words, the Scribes may have composed the text in their own style and diction, yet they drew from the same vernacular or “language.”

In my eyes the paper only confirms Curriers findings about statistically significant differences in the Voynich text. This differences are important since in natural languages "there will be frequent words distributed equally over the entire text, the so-called function words (like conjunctions, articles etc.). They do not appear contextual, but rather serve to implement grammatical structures, and they normally do not have co-occurring similar words of comparable frequency." "However in the VMS frequently used tokens differ from page to page." [Timm & Schinner 2019, p. 6].

In 4.2.2 the paper compares the VMS with a english text and states: "The trend lines closely match each other and provide a reason to examine if the ‘o’ of v101 is the ‘E’ of English and if other VM characters may be in the vicinity of the other letters of the English alphabet, when reviewed in the rank order." (see Figure 4)
[attachment=9427]

About Zipfs law the paper states: "The linear fit of the resulting curve yields a near-perfect R2 of 0.99, thus showing the significant Zipf of the VM text. These results agree with the natural language hypothesis for the reasons provided by others (Landini 2010; Reddy and Knight 2011; Bowern and Lindemann 2020)."

However, as shown in Timm & Schinner 2019 also a "facsimile" text generated by self citation fulfills both of Zipf’s laws [see Timm & Schinner 2019].

[attachment=9430]
About the "Brevity Law" the paper states "Brevity Law,’ also called Zipf’s law of abbreviation, as applied in linguistics, states that the more frequently a word is used, the shorter that word tends to be (Zipf 1949)." The paper interprets Figure 5 as "It can be inferred that the length of the frequent words is on the lower end of the scale."

However, Figure 5 does not demonstrate that the most frequent words are shorter than other words. On the contrary, it shows that nearly all word types are short.

[attachment=9431]
Figure 1 presents a binomial word length distribution. However, the paper provides no explanation for the differences between figures 1 and 5.

Also, the fit he got between the character distribution of English and Voynichese (0.98 correlation?? the dream of any frequency analyst, that would almost imply Voynichese is English... but at least he did not openly claim that) depends on the specific transcription he used. Using a different one, say EVA, would change things a lot. A dubious result.

tavie

Bluetoes101

Koen G

tavie

dfs346

DonaldFisk

Rafal

Torsten

Mauro