The Voynich Ninja

Pages: 1 2

Quote:The more surprising find is that there is also a propensity for certain word tokens to occur immediately before, or immediately after, the hand drawn plant illustrations

Interesting, because it may confirm the idea of illustrations serving as a border of the same order as the normal folio border would. Actually it may turn out that the text should be parsed not line by line (as it is parsed now) but instead in a multi-column fashion - or even in some more complicated fashion with pre-defined text blocks being scattered across the folio space according to a certain procedure - as the perceived baseline jumps discussed in the other thread may suggest.

(28-04-2024, 04:58 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.
(28-04-2024, 06:04 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.I want to push back slightly on the use of the term "intent" as implying that the scribes are choosing shorter words to make them fit before the drawing element or for some reason achieve a given line length. If one thinks (as I do) that spaces are inserted in some algorithmic way and that the algorithm operates at the level of lines (or interrupted lines), then last words will tend to be shorter for no other reason than because they're the left-over bit of text at the end.

I understand what you are saying there. And that particular interpretation of the word intent did occur to me.

I'd have to think about how to measure it, but one way to test this might be to ask how often "words" to the left of drawing elements are prefixes of other words compared to...the average word (but hapax will bias that)? other words of that length?

Thoughts?

Karl

(01-05-2024, 01:09 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.
(28-04-2024, 04:58 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.
(28-04-2024, 06:04 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.I want to push back slightly on the use of the term "intent" as implying that the scribes are choosing shorter words to make them fit before the drawing element or for some reason achieve a given line length. If one thinks (as I do) that spaces are inserted in some algorithmic way and that the algorithm operates at the level of lines (or interrupted lines), then last words will tend to be shorter for no other reason than because they're the left-over bit of text at the end.

I understand what you are saying there. And that particular interpretation of the word intent did occur to me.

I'd have to think about how to measure it, but one way to test this might be to ask how often "words" to the left of drawing elements are prefixes of other words compared to...the average word (but hapax will bias that)? other words of that length?

Thoughts?

Karl

I think that's an excellent idea.
I think there are a number of interesting analyses that could be done on morphemes (besides the many things tried by people to date). I had started a bit down that road but have about 3 other related projects on the go at the moment. One thing I wanted to do first though was establish a clean (i.e. objectively analyzed) lexicon of the morphemes, and identifying the main set (as in the top 80% or something, so as to remove the least common). And then parse the corpus from one of the transliterations (such as the ZL one) into morphemes. My current software system maintains an entire corpus as separate detailed tables of folios, lines, tokens, and glyphs. This would add a fifth table of morphemes. Once at that stage, performing a number of statistical analyses, involving attributes such as position relative to drawings and margins, etc. becomes very convenient.

(26-04-2024, 11:44 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.I am in the process of compiling a database of the physical widths of words (and glyphs) and of the spaces between words, between words and drawing, between words and margins, etc. This may provide data for other related analyses of these ideas.

I have done a nearly complete manual transliteration with much more detailed spacing symbols than in publicly available transliterations, ranging from very small spaces to very large space (6 "flavors" + baseline jumps) hoping to understand the "subtle intent" behind them. An OCR would be more precise in placing glyphs, but I wanted the spacing values to be relative to (variable) size of the context: by contrast if the width of the previous glyph is abnormally small, the space looks larger.

Quote:Why should the drawings have any such effects on the token choices?

I suppose that the choice of shorter words before illustrations and at the end of the line is understandable in a system that does not allow split words (but they could have use hyphens). The longer words in these situations could be nulls maybe (like qotaiin and chotaiin)... Pure speculation, I don't know. Smile

Since there was interest in the possibility of "word" tokens being split up across drawing intrusions, I thought I'd provide the following table
for anyone that wants to study the implications.
This table shows all instances (from our study corpus of Herbal-Scribe 1) where the joining of tokens occurring just before and just after a drawing forms a token that is found at least once somewhere else in the corpus.
(And there was one case where the joining of three tokens across two drawing intrusions formed a token that was found elsewhere.)
The table lists in each row the split token first (followed by its EVA form), and then the number of times it was found. Then it shows the joined version of the token and how many times it was found in the corpus. Then it also shows the component parts again along with the how many times each part was found in the corpus.

The fact that, in most all cases, the individual parts (i.e. the tokens as found before and after the drawing intrusions) are found much more often through the whole corpus than is the combined token, suggests that the tokens spanning the drawing intrusions --and the peculiarities of tokens having propensities for those positions-- are probably not the result of splitting a token into parts.
Caveat: I have not performed the rigorous calculations for the previous statement; I make it only on the basis of a qualitative inspection of the table.

There are some other medieval manuscripts that show tokens written around and spanning illustrations, by the way. But they do generally exhibit the splitting of words to fit the spaces. It would be great to find a suitable transliteration of some of those to demonstrate (as I'm pretty confident they would) that any statistical anomalies in word lengths are only associated with those words that have been split and not the ones kept intact, but I was never able to locate suitable transliterations for the purpose.

You are not allowed to view links. Register or Login to view.

(Unfortunately I have not been able to figure out how to link to an image and have it render directly here.)

(03-05-2024, 12:49 AM)asteckley Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view.

(Unfortunately I have not been able to figure out how to link to an image and have it render directly here.)

(26-04-2024, 09:48 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.The whole catalog of tables is included in the Supplemental Online Material at:
You are not allowed to view links. Register or Login to view.

Here is the same image from the catalog:

[Image: T_Split_Tokens_Table.png]

Pages: 1 2

Anton

kckluge

asteckley

nablator

asteckley

merrimacga