The Voynich Ninja

Many years ago (2014? God I feel old) Brian Cham proposed the "You are not allowed to view links. Register or Login to view. which I helped with on the statistical analysis.

We suggested that

Quote:the Curve-Line System is an intentional feature of the text design, and the text of the Voynich Manuscript is a highly artificial system.

Now, my personal opinion remains much the same - the design of the glyphs is highly intentional, and the text is highly artificial (albeit the underlying paradigm is based upon recognisable 15th century tropes).

But the basics of the CLS remain the same. That is, that "like attracts like". A few basic rules allow us to assume, with great confidence, which glyph will follow which.

Some time ago I zipf'ed the percentages and then graphed the number of exceptions to this rule on Currier A vs Currier B. Despite being very different in number, they graphed perfectly:

[attachment=5043]

[attachment=5042]

So what's going on? I asked myself again tonight.

Quote:4.2.3 Results

Currier language A

There are 4,040 invalid words in a text of 10,645. That is 37.95% of the t

The body of 4,040 invalid words is comprised of 1,328 distinct words, a ratio of 32.87 (to 2 dp).

Of these 1,328 distinct words, 793 (59.71% to 2dp) are ungrammatical by one inconsistency, 404 (10%) are ungrammatical by two inconsistencies, 102 (7.68% to 2dp) by three inconsistencies, 26 by four inconsistencies and 3 by five inconsistencies.

Currier language B

There are 5,870 invalid words in a text of 20,969. That is 27.99% of the total.

The body of 5,870 invalid words is comprised of 1.877 distinct words, a ratio of 31.98 (to 2 dp).

Of these 1,877 distinct words, 1,116 (59.45% to two decimal places) are ungrammatical by one inconsistency, 606 (32.29%) are ungrammatical by two inconsistencies, 128 (6.82% to 2dp) are ungrammatical by three inconsistencies, 24 by four inconsistencies and 3 by five inconsistencies.

Total

Out of a total of 31,614 words tested, 9,910 are invalid. That is 31.35% of the total. The total of unique aberrant words across the whole corpus has not been tested.

What we are seeing here is that the idea of the CLS works, and that as it breaks down it breaks down in a Zipfian way. That is, that there is a power law underlying the inconsistencies in the idea. This could very well be attributable to a use of an artificial list of words that cannot be turned into Voynich glyphs, ie, proper nouns.

[/quote]

A Supposition:

Non conforming words have to be non-conforming to the CLS because they are nouns, and reducing them into the CLS system would destroy information.

Quote:Manually skimming through the list of non-conforming words, David noted that almost half had “l” as the first letter. Looked like a good place to start, so I tested word-conformance rate across different beginnings of all words in the manuscript text. Most were above 90%, but there were exceptions: words starting with “l” were about 14.7% conforming and those starting with “r” were 40.8% conforming.

Could this be explained by the idea that “l” and “r” can be prefixed to a word arbitrarily? Turns out that words with these prefixes are otherwise conforming to the CLS without them, confirming my suspicion.

Furthermore...

Quote:Three aberrant glyphs which only have medium or high conformity to the proposed CLS system.

However, these three aberrant glyphs conform to very specific rules, and seem to be part of specific ngrams that occur due to some as-yet-unidentified, but very specific, reason.

“o” is aberrant 44.51% of the time, when it appears in the following bigrams: “ol”, “or” and (rarely) “lo”, “ro” (where “ro” could be a confusion for “lo”).

“l” is aberrant 26.83% of the time, when it appears in the following bigrams: “lo” (see rule 1), “ly”, “ld”. Furthermore, these two bigrams always appear in the following trigrams: “oly”, “aly”, “old”, “ald”.

“r” is aberrant 15.76% of the time, when it appears in the following bigrams: “ro” (see rule 1), “ry”, “ra”. These last two bigrams are almost always part of the following trigrams: “ara”, “ora”, “ary”, “ory”.

What can account for this?

Let's get into a 15th century mindset. It is perfectly logical that if you create a writing system, you reduce work. By creating a system that allows you to draw a line, followed by a curve. Faster and easier. You only break the system to introduce necessary information.

Could it be as simple as a syllabic reduction of words, with added expansion of contractions, but when proper nouns are included the syllabic reduction doesn't work because of the loss of information so extra consonants are included?

(And yes I'm fully aware of recent work carried out in this area in recent years, hello Emma and Marco!)

So the question here is : Can we speak the Voynich? we just need some experts in spoken medieval Romance languages to give it a try.. Or is there a different angle I'm not seeing?

(offtopic) As always, the statistics are interesting, but we end looking at them slack jawed with no real new insight....

I was always intrigued by the CLS.
I am curious about the "exception" words, which if I remember correctly make up about 30%, most of the time differing by only one token.
I wonder how these exceptions might relate to effects described by Emma and Marco about positional transformations of words at the beginning or end of lines.
Is there a correlation between the exception words and their position?

In fact that was the first article "peer-reviewed" in our forum. I'm frightened to recognize that that was five!!! years ago.

I think the proposal is interesting and there is certainly something present in either the design or use of the script which causes this phenomenon. My main observation would be that "line" glyphs are too non-conforming and there might be a more elegant way to explain the patterns rather than patching the initial hypothesis.

david

VViews

Anton

Emma May Smith