The Voynich Ninja
Usefulness of categorizing words based on occurances in different parts of the MS? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Usefulness of categorizing words based on occurances in different parts of the MS? (/thread-2801.html)

Pages: 1 2 3


Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 30-05-2019

I was wondering if this method is feasable:

The MS contains 6 different "parts", Herbal, Astronomical, Biological, Cosmological, Pharmaceutical and the Recipe part.

If I went through the Herbal part and wrote down the numbers of occurances for each word, and also the number of occurances per "part", I could hopefully draw some conclusions what type of words this is.

For instance, let's say I find a word in the herbal that exists in about 50% of the herbal pages, 15% of the pharmaecutical pages, 10% of the recipe pages and 5% of the biological pages, it means the word could mean something like "water, nutrition, stem, seed". This is not awfully specific, but what if this scenario happened:

A word is found in the herbal that only occurs a total of 3 times in the entire MS. The other two occurances are in the astronomical and the recipe part.
If this word is nearby a zodiac illustration, it means that the plant could theoretically share the name with the zodiac. If we want to identify the plant name, this could be a good approach. If a word is so uncommon that it only occurs 3 times in this massive MS and in different parts, it could be an indication that this is a name rather than a word.

Has this been done already or is this approach bad? Thanks for taking your time.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - RobGea - 30-05-2019

That's a good idea.
qokain, qol, You are not allowed to view links. Register or Login to view., qotedy  seem to link B+R together


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 30-05-2019

(30-05-2019, 05:17 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.qokain, qol, You are not allowed to view links. Register or Login to view., qotedy  seem to link B+R together
Yes indeeed!

The words you suggested are really good to use for illustrating the point. See this link for instance:
You are not allowed to view links. Register or Login to view.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Common_Man - 30-05-2019

Rather than having x number of an example word occuring in one section vs y number in another section, I'd like to see number of uses of some particular word per unit word in a particular section compared to that in another.

Maybe we can differentiate between common words (nouns/verbs/etc will appear almost uniformly per word in every section) and content - providing (those related to the subject in discussion) words in the manuscript..

I guess there's some sort of such a study already available, can anyone give a link if there are any..


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Anton - 30-05-2019

The approach is good, however it is not clear how exactly to apply it.

Something in that direction was done in Montemurro & Zanette 2013 You are not allowed to view links. Register or Login to view.

Wladimir's "Dulov's ratio" (as I personally call it) is a related concept.

Perhaps it would be useful to prepare a "key-value" dictionary of Voynich where each vord ("key") would have its affinity to a topic described ("value").

In particular, it would be of interest to learn whether a vord and its respective prefixed vords (e.g. "tol" on one hand and "otol" and "qotol" on the other hand) generally exhibit the same thematic affinity. That would shed light on whether prefixes are some kind of operators (like case modifiers or prepositions) or not.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 30-05-2019

(30-05-2019, 07:14 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.The approach is good, however it is not clear how exactly to apply it.

Something in that direction was done in Montemurro & Zanette 2013 You are not allowed to view links. Register or Login to view.

Wladimir's "Dulov's ratio" (as I personally call it) is a related concept.

Perhaps it would be useful to prepare a "key-value" dictionary of Voynich where each vord ("key") would have its affinity to a topic described ("value").

In particular, it would be of interest to learn whether a vord and its respective prefixed vords (e.g. "tol" on one hand and "otol" and "qotol" on the other hand) generally exhibit the same thematic affinity. That would shed light on whether prefixes are some kind of operators (like case modifiers or prepositions) or not.

Thanks for the informative reply!

I was not aware of the Montemurro & Zanette paper. I started reading it and some of the stuff they are describing is what I am looking for.

And yes, I agree with your idea about the words and prefixes.

It seems to me that a good programmer could be really useful for this kind of task.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - -JKP- - 30-05-2019

(30-05-2019, 03:42 PM)joben Wrote: You are not allowed to view links. Register or Login to view.I was wondering if this method is feasable:

The MS contains 6 different "parts", Herbal, Astronomical, Biological, Cosmological, Pharmaceutical and the Recipe part.

...

I know these are the historical designations, but I don't like them. If something is bad, I think it's a good idea to change it.

We don't know if the plants are all herbs (some might be trees). I prefer to call them the Big Plants and Small Plants sections (which I suppose is also a bit ambiguous since it doesn't refer to the size of the plant, it refers to the size of the plant drawing, but it's better than assuming they are herbs).

We don't know if there is a Pharmaceutical or Recipe section. The section at the end might be 1) proverbs or 2) good and bad days or 3) historically important events or 4) something else... I prefer to call the two sections of unillustrated text the Dense Text pages and the Starred Text pages, but there might be better designations.

The biological section might be biological or it might be mythological but I can't think of a better name than biological, but perhaps someone else can.


The name "cosmological section" is not as objectional since medieval cosmology was a broad grab-bag of beliefs about the universe (and there are a lot of stars and suns in the images), so I have fewer objections to this name than to some of the others.


Mainly I dislike herbal, pharmaceutical, and recipes. Even if these names turn out to be correct, it's really not good to call them that in advance. If the dense text consists of good and bad days, historical events, a genealogy, a textual calendar, or a list of stones and their properties... something along those lines, then they really don't fit the definition of recipes.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Emma May Smith - 30-05-2019

So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

A better idea would be not to assign semantic categories but rather grammatical categories. There are far fewer grammatical categories in a language, so much less precise guesses are needed, and there will be some internal validation of your results because certain grammatical patterns would be unlikely or prohibited.

I recommend reading some of Marco Ponzi's work here You are not allowed to view links. Register or Login to view.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - ReneZ - 30-05-2019

There is one very big problem with the Montemurro and Zanette analysis, namely that the B-language part of the Herbal pages and the A-language part of the Herbal pages are not related to each other. This undermines the tentative conclusion that the selected key words are related to the topic of the pages based on the illustrations.

This is from memory, so a closer look at this question is warranted.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 30-05-2019

(30-05-2019, 08:15 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 03:42 PM)joben Wrote: You are not allowed to view links. Register or Login to view.I was wondering if this method is feasable:

The MS contains 6 different "parts", Herbal, Astronomical, Biological, Cosmological, Pharmaceutical and the Recipe part.

...

I know these are the historical designations, but I don't like them. If something is bad, I think it's a good idea to change it.

We don't know if the plants are all herbs (some might be trees). I prefer to call them the Big Plants and Small Plants sections (which I suppose is also a bit ambiguous since it doesn't refer to the size of the plant, it refers to the size of the plant drawing, but it's better than assuming they are herbs).

We don't know if there is a Pharmaceutical or Recipe section. The section at the end might be 1) proverbs or 2) good and bad days or 3) historically important events or 4) something else... I prefer to call the two sections of unillustrated text the Dense Text pages and the Starred Text pages, but there might be better designations.

The biological section might be biological or it might be mythological but I can't think of a better name than biological, but perhaps someone else can.


The name "cosmological section" is not as objectional since medieval cosmology was a broad grab-bag of beliefs about the universe (and there are a lot of stars and suns in the images), so I have fewer objections to this name than to some of the others.


Mainly I dislike herbal, pharmaceutical, and recipes. Even if these names turn out to be correct, it's really not good to call them that in advance. If the dense text consists of good and bad days, historical events, a genealogy, a textual calendar, or a list of stones and their properties... something along those lines, then they really don't fit the definition of recipes.

This is a good point and I assume that if I dig deeper into this MS I will start noticing things like this. 

I always assumed the part names was an oversimplication for an easier overview of the MS. 

I think that storing every word in a database together with the number of occurences on each page is a good start. Then if we made it simple for anyone to create their own part names and connect each page to the part they feel it belongs to, then we would get a statistical overview that doesn't use the historical "made up" part names. If this hasn't been done already, I think it would be a simple task for a programmer.