The Voynich Ninja
Usefulness of categorizing words based on occurances in different parts of the MS? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Usefulness of categorizing words based on occurances in different parts of the MS? (/thread-2801.html)

Pages: 1 2 3


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 30-05-2019

(30-05-2019, 09:01 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

A better idea would be not to assign semantic categories but rather grammatical categories. There are far fewer grammatical categories in a language, so much less precise guesses are needed, and there will be some internal validation of your results because certain grammatical patterns would be unlikely or prohibited.

I recommend reading some of Marco Ponzi's work here You are not allowed to view links. Register or Login to view.

Thanks, I will read up more on this guy.

Basically what I'm thinking about now is finding some low-hanging fruit that has most likely already been tried. For instance, if a plant has berries on the illustration, the word for "berry" might exist on that page. If the same word would be present only on pages with illustrations of plants with visible berries, then it could be something worth looking into.

(30-05-2019, 09:09 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.There is one very big problem with the Montemurro and Zanette analysis, namely that the B-language part of the Herbal pages and the A-language part of the Herbal pages are not related to each other. This undermines the tentative conclusion that the selected key words are related to the topic of the pages based on the illustrations.

This is from memory, so a closer look at this question is warranted.

I have read about the A and B-languages before but I havn't understood it completely. I will try to learn more about this because it seems like this is something that can mess up my plan.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Koen G - 30-05-2019

(30-05-2019, 09:32 PM)joben Wrote: You are not allowed to view links. Register or Login to view.I have read about the A and B-languages before but I havn't understood it completely.

Nobody quite understands them completely... We know that something is going on but there is plenty of room for additional research on Currier languages.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Anton - 30-05-2019

Quote:So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

With the discussed approach, one does not need to assign semantic values, it would be enough to classify each vord into one of several broad categories based on pre-defined quantitative criteria.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Emma May Smith - 30-05-2019

(30-05-2019, 09:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

With the discussed approach, one does not need to assign semantic values, it would be enough to classify each vord into one of several broad categories based on pre-defined quantitative criteria.

The proposed approach is very much aimed at a semantic goal. It's based on the assumption that different sections of the manuscript deal with certain topics, and that some of those topics may be shared between sections. Hence any word falling across multiple sections must fall into one of the shared topics.

The categorisation is based on potential meaning, however broad the categories would be, and by placing a word in one of those categories you are assigning meaning, however broad. Yet there's unlikely to be a good way of narrowing those categories down sufficiently to make them useful.

A functional approach would be much more likely to result in usable information.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - -JKP- - 30-05-2019

Joben, several years ago, I spent more than two years (exhaustive and exhausting work) creating a VMS concordance.

In other words, I mapped every single recurring word to its shape-mates throughout the manuscript. The document describing these relationships is more than 1100 pages long and I also have the majority of the common tokens in a database that lets me pull out occurrences and relationships, and even common "prefixes" and "suffixes" associated with tokens that occur both alone and within other tokens.

It was a huge commitment of time and effort.

In terms of "meaning" (define that as broadly as possible) none of the things I hoped or expected to see emerged from this deep study of token relationships. It simply does not come out like viable content on specific subjects or like normal linguistic patterns.

The only really obvious patterns were:
  •  that "n" tokens are more likely to relate to one another in the plant sections and "ot-" words in the astrological/cosmological sections, and
  • "rare" tokens were almost always combinations of common tokens (a bit like compound words in English), and
  • certain tokens were more prevalent on certain folios, as though frequency priority had shifted sightly. This MIGHT be semantic, or it might be a slight adjustment in how the text is generated if it's contrived or synthetic, and
  • some tokens look suspiciously like Latin loanwords but, in a manuscript as long as the VMS, it's rather easy to find real words (and sometimes phrases) in many languages if you cherry-pick certain patterns, so this may be an artifact.
This is a very slim reward for a huge amount of work.

So I tried again with token-pairs. Same result.


Looking at this data, some people would conclude that Voynichese is nonsense text.

I don't think nonsense-text i
s the only possible explanation.

Symbolic?, synthetic?, OR perhaps a need to parse the spaces and adjacent glyphs in a different way. This
might still result in meaningful content, but it means going back to the drawing board and approaching it with fresh eyes.

It's my personal opinion (and not everyone agrees) that we should be suspicious of spaces—VMS tokens are not necessarily words. I also think there may be multiglyphs (I've blogged about this), but peeling them out (if they exist) is not as easy as simply looking at frequencies. There are places in the VMS where pairs appear to behave as units, but elements from these pairs sometimes seem to behave as parts of triglyphs, or even as elements in their own right.


I can recognize if Voynichese is faked. Voynichese is consistent enough that algorithmically defined look-alikes are almost always easy to spot (providing they haven't cheated by simply replicating whole tokens). I can also quickly spot the common glyph-combination patterns and can explain exactly why Rugg's method doesn't work. But I cannot tell you what Voynichese means (if anything) even after creating four transcripts and a massive relational concordance.



RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Anton - 31-05-2019

(30-05-2019, 10:39 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 09:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

With the discussed approach, one does not need to assign semantic values, it would be enough to classify each vord into one of several broad categories based on pre-defined quantitative criteria.

The proposed approach is very much aimed at a semantic goal. It's based on the assumption that different sections of the manuscript deal with certain topics, and that some of those topics may be shared between sections. Hence any word falling across multiple sections must fall into one of the shared topics.

The categorisation is based on potential meaning, however broad the categories would be, and by placing a word in one of those categories you are assigning meaning, however broad. Yet there's unlikely to be a good way of narrowing those categories down sufficiently to make them useful.

There may be such underlying assumption, but it is not required at all. The categorisation would not be based on meaning, it would be based on the meaning-invariant spatial criterion. The entire MS is divided into several parts (groups of folios), and then a vord either falls into a specific part or it falls not.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Linda - 31-05-2019

(30-05-2019, 05:17 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.That's a good idea.
qokain, qol, You are not allowed to view links. Register or Login to view., qotedy  seem to link B+R together

I had noticed that before too. I am pretty sure there is nothing to do with biology or recipes, though.
I think they are likely geographical and historical, respectively.

Quire 13 is related to the zodiac section through its use of nymphs and certain other features. I think they are not zodiac months, but ages, which makes them historical too, plus there are the tailed stars there as well as in
Quire 20.

Quire 13 is also related to Quire 14 by certain motifs and i think overall topic as well, and I think at least part of the cosmological section is actually hydrological. 

But there are examples of labels that are the same but do not seem to have common ground. I dont know that common vords mean anything, i think they have to be unravelled first, somehow.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - Emma May Smith - 31-05-2019

(31-05-2019, 12:01 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 10:39 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.
(30-05-2019, 09:54 PM)Anton Wrote: You are not allowed to view links. Register or Login to view.
Quote:So this is an interesting idea, but I think you would struggle to usefully assign semantic values to the words. There's no way to know what any given word might mean and little internal validation for the guesses you'll be forced to make.

With the discussed approach, one does not need to assign semantic values, it would be enough to classify each vord into one of several broad categories based on pre-defined quantitative criteria.

The proposed approach is very much aimed at a semantic goal. It's based on the assumption that different sections of the manuscript deal with certain topics, and that some of those topics may be shared between sections. Hence any word falling across multiple sections must fall into one of the shared topics.

The categorisation is based on potential meaning, however broad the categories would be, and by placing a word in one of those categories you are assigning meaning, however broad. Yet there's unlikely to be a good way of narrowing those categories down sufficiently to make them useful.

There may be such underlying assumption, but it is not required at all. The categorisation would not be based on meaning, it would be based on the meaning-invariant spatial criterion. The entire MS is divided into several parts (groups of folios), and then a vord either falls into a specific part or it falls not.

These "several parts" are defined by presumed similarity of content. The spatial criteria proposed are semantic criteria.


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - -JKP- - 31-05-2019

Joben, there are certain patterns in how information is presented in medieval herbal manuscripts. Most of them follow four or five "formulas" for presentation.

By creating a concordance of all the token-relationships in the VMS, I hoped to flush out whether there were patterns and what they were.

The result was not what one would expect if the VMS were a substitution cipher (which is how most people interpret it).


So... 1) assuming the text is meaningful, and 2) assuming the text is related to the images, there have to be other dynamics going on and VMS "words" are probably not words in the linguistic sense. If they were, these patterns would have been discernible.

The VMS text is positional. Certain things are always at the beginning, some always in the middle, some usually at the end. The letter frequencies in specific parts of tokens are not normal for natural languages.

The text has been manipulated in some way, so... taking it at face value doesn't work (it would have been long since decoded if this were a simple substitution cipher). I think it is unwise to assume that VMS tokens are words (in the linguistic sense).


RE: Usefulness of categorizing words based on occurances in different parts of the MS? - joben - 31-05-2019

(31-05-2019, 07:33 PM)-JKP- Wrote: You are not allowed to view links. Register or Login to view.Joben, there are certain patterns in how information is presented in medieval herbal manuscripts. Most of them follow four or five "formulas" for presentation.

By creating a concordance of all the token-relationships in the VMS, I hoped to flush out whether there were patterns and what they were.

The result was not what one would expect if the VMS were a substitution cipher (which is how most people interpret it).


So... 1) assuming the text is meaningful, and 2) assuming the text is related to the images, there have to be other dynamics going on and VMS "words" are probably not words in the linguistic sense. If they were, these patterns would have been discernible.

The VMS text is positional. Certain things are always at the beginning, some always in the middle, some usually at the end. The letter frequencies in specific parts of tokens are not normal for natural languages.

The text has been manipulated in some way, so... taking it at face value doesn't work (it would have been long since decoded if this were a simple substitution cipher). I think it is unwise to assume that VMS tokens are words (in the linguistic sense).

Thanks for taking your time explaining. I understand that you have gone through a lot of effort and I will dig deeper into the observations that you have described here. I need to get a better understanding of what voynichese is and isn't before I try to draw conclusions on my own. I feel like there is a lot to learn.