The Voynich Ninja
Extension to the Currier languages - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Extension to the Currier languages (/thread-4246.html)

Pages: 1 2


Extension to the Currier languages - ReneZ - 23-04-2024

As should be well known to everyone, in the 1970's Prescott Currier identified two languages ("A" and "B") in the Voynich MS text.

I have been looking at consolidating and extending that work off and on in the past, and just recently completed a first iteration of such a consolidation and extension.

Most of the pages that Currier did not classify appear to be in some intermediate form, which I have decided to call "C" language. Furthermore, using quantitative criteria, all pages have now been classified into these three languages, and a number of sub-categories or dialects.

I consider this not a closed activity. There are still important properties of the text that have not been taken into account.

The details of this first stage are described briefly You are not allowed to view links. Register or Login to view. .


RE: Extension to the Currier languages - Juan_Sali - 23-04-2024

(23-04-2024, 03:51 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The details of this first stage are described briefly You are not allowed to view links. Register or Login to view. .
Shortcomings in Currier's language identification
A first shortcoming in Currier's identification is that he did not provide a classification for all pages in the MS (4). "Ref-2" (mentioned above) already indicates that the pages not classified by him may possibly represent some intermediate language (or range of languages).

A second more subjective shortcoming is, that some of his criteria are more of a qualitative nature. A more precise criterium to split between his two languages was identified over the years, namely the presence of the bigram ed (Eva: "ed"). This is essentially non-existent in all A-language pages, and very frequent on all B-language pages.
The bigram ed has 5004 matches and the trigram ed9 4151 You are not allowed to view links. Register or Login to view.
The trigram ed9 is mostly ending words, and due to its high frecuenciy it is needed an analisys of larger n-grams including it.
The bigram ed not followed by 9 has 853 matches (5004 - 4151). In terms of analysis of n-gramms this bigram can be mostly splitted to asign both of them to different n-grams, the d is the starting of common trigrams.



RE: Extension to the Currier languages - nablator - 23-04-2024

Quote:Text on herbal pages classified by Currier as A-language: 6926

Why exclude the 10 pages from quires 15 and 17?

I don't understand the word count for the "pages with pharmaceutical illustrations" (16 Pharma-A pages?)


RE: Extension to the Currier languages - ReneZ - 23-04-2024

I will see if I can double-check in the next days. We're travelling and I don't have everything with me.


RE: Extension to the Currier languages - dashstofsk - 24-04-2024

There is a possible scenario that can easily explain the difference in the languages.

It is this that the author of the manuscript invented a private alphabet, together with a mechanism for forming the text, with the intention of creating an invented work that others could marvel at but which nobody would be able to read. Possibly also the manuscript was not written in one go but each section was written at a different time. If there was a significant time gap between the writing of each section then it is possible that the author lost some fluency in the use of his own language. Thus the new section would be in a slightly different language and syntax. The big difference between languages A and B might be an indication of a long time gap between writing. The smaller difference in the two language clusters in quire 20 ( being one example ) might indicate a shorter time gap.

Because the language is an invented one the author possibly did not write any formal guide or reference for it, and did not feel obliged to have to follow any particular standard, and may have been happy to have the language 'develop'.


RE: Extension to the Currier languages - HermesRevived - 24-04-2024

Some discussion of these matters, with attention to Rene's recent work, here:

You are not allowed to view links. Register or Login to view.

(Due to some computer problems, now posting under a new account - formerly Hermes777.)


RE: Extension to the Currier languages - ReneZ - 25-04-2024

(24-04-2024, 07:46 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.There is a possible scenario that can easily explain the difference in the languages.

It is this that the author of the manuscript invented a private alphabet, together with a mechanism for forming the text, with the intention of creating an invented work that others could marvel at but which nobody would be able to read. Possibly also the manuscript was not written in one go but each section was written at a different time. If there was a significant time gap between the writing of each section then it is possible that the author lost some fluency in the use of his own language. Thus the new section would be in a slightly different language and syntax. The big difference between languages A and B might be an indication of a long time gap between writing. The smaller difference in the two language clusters in quire 20 ( being one example ) might indicate a shorter time gap.

Because the language is an invented one the author possibly did not write any formal guide or reference for it, and did not feel obliged to have to follow any particular standard, and may have been happy to have the language 'develop'.

This is a possibility. One of the main reasons for doing this is trying to find the specific differences and see what this tells us about the 'rules' governing the text, and how these may have evolved.
This could work both if the text is generated from a plain text, or generated from nothing.


RE: Extension to the Currier languages - ReneZ - 25-04-2024

(23-04-2024, 12:28 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
Quote:Text on herbal pages classified by Currier as A-language: 6926

Why exclude the 10 pages from quires 15 and 17?

I don't understand the word count for the "pages with pharmaceutical illustrations" (16 Pharma-A pages?)

I may of course have made a mistake somewhere, but here is what I intended to do.

The input text was the RF transliteration, in the 'basic Eva' version.
I cannot check now, but I strongly suspect that I treated uncertain spaces as word spaces.

The herbal-A text (part "A") should include all herbal-A pages in quires 1 to 8, plus folios 87 and 90 in quire 15 and folios 93 and 96 in quire 17. The latter are indeed 10 pages, and if they are not included then this was an error. The only easy way to check that is when I have access to my Linux laptop again, in about 3 weeks. However, I can try to do something if you confirm that the numbers are wrong. It should not have any impact on the result, but of course I want the numbers to be correct.

The pharma part (part "P") should include all non-label words on folios 88, 89, 99, 100, 101 and 102.
I count these as 16 pages indeed. 88, 90, 100 and 101 as one page per side and 89 and 102 as two pages per side.


RE: Extension to the Currier languages - nablator - 25-04-2024

(25-04-2024, 02:58 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The herbal-A text (part "A") should include all herbal-A pages in quires 1 to 8, plus folios 87 and 90 in quire 15 and folios 93 and 96 in quire 17. The latter are indeed 10 pages, and if they are not included then this was an error.

Yes, you missed the 10 pages from quires 15 and 17:

cat RF1a-n-herbal-a.txt | grep "<!" | wc -l
95

cat RF1a-n-herbal-a.txt | grep "$I=H $L=A" | wc -l
95

cat RF1a-n-herbal-a.txt | grep -v "<!" | cut -c19- | sed "s/<->/ /g;s/\./ /g;s/,/ /g" | wc -w
7795

Quote:The pharma part (part "P") should include all non-label words on folios 88, 89, 99, 100, 101 and 102.

I count these as 16 pages indeed. 88, 90, 100 and 101 as one page per side and 89 and 102 as two pages per side.

Here I don't know what happened:

cat RF1a-n-pharma.txt | grep "<!" | wc -l
16

cat RF1a-n-pharma.txt | grep "$I=P $L=A" | wc -l
16

cat RF1a-n-pharma.txt | grep -v "<!" | grep ,.P | cut -c19- | sed "s/<->/ /g;s/\./ /g;s/,/ /g" | wc -w
2367

cat RF1a-n-pharma.txt | grep -v "<!" | grep -v ,.L | cut -c19- | sed "s/<->/ /g;s/\./ /g;s/,/ /g" | wc -w
2367


RE: Extension to the Currier languages - ReneZ - 25-04-2024

Thanks! This type of double-check is really helpful.

So, your count of 7795 words in herbal A is 869 higher than my 6926.
Then, your count of 2367 words in Pharma is 869 lower than my 3236.

Somehow, 869 words ended up in the wrong place somewhere.

Quire 15, outer bifolio has 481 words and quire 17, outer bifolio has 388 words and the sum is 869.
These are herbal, and apparently they ended up in my pharma word count.

This observation matches what you suspected, and explains both anomalies perfectly.

However, what I will need to check, and cannot do now, is if this only affects the word counts, or also
the numbers in the other columns. This is because these come from different scripts.

In any case, all the statistics per bifolio, per folio and per page are not affected by this.