The Voynich Ninja

Pages: 1 2 3 4 5 6

(25-01-2020, 05:06 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.The length of the paragraphs is a good metric. Besides the average length, I would immediately suggest to compare the RMS deviation. Are all recipes roughly of the same length? Or there are some extremely short, others extremely long?...

Hi Anton,
I agree that deviation looks very significant for this task. A problem with it is that it is difficult to estimate without a complete and well-structured transcription that also makes paragraphs boundaries clear. For instance, the Bologna transcription provides a total number of words and total number of recipes, but (without significant preprocessing) it does not tell us how long each recipe is. For the two parts of the BNF ms, we don't have transcriptions and things are even harder. But yes, when a well structured transcription is available, deviation from the mean average will certainly be informative.

(25-01-2020, 05:06 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.About the structure. This is the domain which needs careful consideration. The text might have been written backwards, or each line backwards.

I believe that TTR is not significantly affected by the direction in which the text was written. MATTR 100, with a window which covers several lines, will not be much affected even by a boustrophedon.
Also, if the same words appear in the same order in several paragraphs, this will also happen if the text is reversed. If the text is a boustrophedon, half of the matches will be preserved and a rigid structure like that in the Alfonsine Astromagia will still be apparent. A similar technique is discussed in You are not allowed to view links. Register or Login to view., Béchet et. al, 2012 (though they work with POS-tagged data): even if efficiently searching for such patterns appears to require complex algorithms, in our small-data domain a simple brute-force search could still be practical.

As you recently pointed out, we have the problem of inconsistency in spelling and abbreviation: in a manuscript, the same word can take different forms. The impact of this varies a lot between manuscripts. I posted an analysis of a Latin script You are not allowed to view links. Register or Login to view.. In that case, about 20% of word tokens appear to be affected: a rigid structure as the Astromagia would still be detected, but weaker patterns would be significantly harder to recover.
Maybe I am being optimistic, but my impression is that the Latin ms I analyzed is less consistent than the average. For instance, Alfonso's Astromagia You are not allowed to view links. Register or Login to view. is much less abbreviated and the script looks more accurate, as in most of the VMS.

While the polaiin patterns I pointed out in my previous comment might result from auto-copying (as by Timm and Schinner's algorithm or some variant of it), deeper structure could support the idea that the text is meaningful. Which other methods can be used to search for patterns and to compare Q20 and the Pharma section to other texts?

(26-01-2020, 11:05 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Which other methods can be used to search for patterns and to compare Q20 and the Pharma section to other texts?

Well, in fact this thread invites the discussion of what answers can we give to this question.

The criterion that you proposed is already a great start, we can use that to make a better guess of the exact type of "instructions" (instead of the narrow and sometimes inadequate term "recipes" I suggest to call these repetitive homogenious blocks "instructions") that Pharma and Q20 contain.

What I think of for now (and what I have in mind to do with MsMurQ12 when I'm ready with my transcription), is to look at top-N (say, top-5, or, at most, top-10) words and see if that significantly differs between instructions of different kinds (would be great to compare between different languages as well). If it does not, then we could take the respective top-N of Pharma and/or Q20 and see if we can build a mapping.

Follow-ups would be: to build a table of most common nouns and to see if any patterns are exhibited in the words which open the paragraphs. E.g. for instructions of type A, they usually begin with nouns, for instructions of type B, they usually begin with prepositions, etc.

Another thing is to find most common phrases (two- or, at most, three-word phrases), like perhaps "so nim" or the like, and then seek for the like patterns in the VMS.

I believe if we dig deeply, we can greatly advance here.

The botanical section offers the same opportunities (that's why initially I started wih that a few years ago), but the patterns there may be weaker. Each botanical folio is of the same "rank" as others, but paragraphs within each folio may be completely irregular.

You are not allowed to view links. Register or Login to view. by You are not allowed to view links. Register or Login to view. is a great collection of medieval recipes: they mostly are culinary recipes, but also a few medical texts are mentioned. Most links are now dead, but the page provides enough information to recover many bits.

For instance, the transcription of You are not allowed to view links. Register or Login to view. can be recovered You are not allowed to view links. Register or Login to view.. The ms is a 13th Century parchment roll from France; it contains about 150 cooking recipes and the average length appears to be very close to 30 words, comparable with Q20.

I am sure that much more can be extracted from Carlin's web page.

They are not always recipes of this length.
These are descriptions of different veins in different places.
By reading the medical books, a completely different understanding of the style of presentation at that time comes up.
So caution in interpretation is advised.
Cooking recipes, medical recipes, description of diseases, course of treatment, production methods pharma, or alchemy.

You are not allowed to view links. Register or Login to view.

If Q20 contains recipes and if daiiin and other such are numbers then would we expect a higher frequency of these here?

(29-01-2020, 01:31 AM)DONJCH Wrote: You are not allowed to view links. Register or Login to view.If Q20 contains recipes and if daiiin and other such are numbers then would we expect a higher frequency of these here?

Not necessarily, quantities and times in the Middle Ages were often not expressed in numbers. Instead, descriptions such as "a handful of them" or "running around the field twice" were used. Recipes were therefore often very imprecise.

(29-01-2020, 06:38 AM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Not necessarily, quantities and times in the Middle Ages were often not expressed in numbers. Instead, descriptions such as "a handful of them" or "running around the field twice" were used. Recipes were therefore often very imprecise.

This is also true until the early 20th C. I own a cookbook for sweats written by my great-grandmother, started sometimes shortly before 1900. There is no uses of weights in it, it only sometimes refers to some quality like "good", or "good amount". There are also no temperatures and times for the oven, only "bake hot" or "slow", something like this.

Well, it wasn't as easy back then. How did you temperature control a wood fire or coal oven? How could you measure something out reliably in a time before mass manufacture of cups, and reliable scales didn't exist for the domestic user?

That said, it's not to be understood that numeric measures were not used at all. On the contrary, not only were they used, but sometimes they were expressed by digits, not by words. I even remember seeing something like 4te for "vierte" (not sure about the exact numeral, but just to give the general idea), I think it was in the same MsMurQ 12.

I just had a quick look at the recipes from the alchemical herbals, and found that in most cases where there is a number, it is written out: "triginta".

In other cases it uses a roman numeral: "XV".

Also:

Quote:Accipe radicem istius herbe et unam branchatam salis et combure in uno testo et pulverem pone super vulneribus et subito sanabuntur vulera et carabuntur.

Pages: 1 2 3 4 5 6

MarcoP

Anton

MarcoP

Aga Tentakulus

DONJCH

bi3mw

voynichbombe

davidjackson

Anton

ReneZ