The Voynich Ninja
Lines interrupted by drawings - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Lines interrupted by drawings (/thread-2945.html)

Pages: 1 2 3


Lines interrupted by drawings - MarcoP - 25-09-2019

For a while, I have been wondering about how mid-line breaks due to drawings compare with "true" line breaks. In other words, I wanted to check if the same LAAFU (line as a functional unit) effects that happen at line boundaries also appear when lines are interrupted by a drawing. For instance You are not allowed to view links. Register or Login to view..
This has likely been studied before, but I am not aware of any specific research in this area.
As always, I cannot exclude I have made errors somewhere in the process.

This analysis is based on the corpus of text from pages that include at least a drawing interruption. I used the Zandbergen-Landini transcription (ignoring uncertain spaces) where mid-line breaks are marked by the three characters sequence '<->'. The text from pages that do not include mid-line breaks was ignored. The corpus includes:
13966 words
11050 regular word breaks (word couples separated by a space)
751 image word breaks (word couples separated by an illustration)
2168 lines

The following histogram illustrates word length, considering:
* all words in the corpus
* first words in lines
* words that appear immediately before an image break
* words that appear immediately after an image break
* last words in lines
   

The histogram shows that the first word of each line is slightly longer than average.  This has been discussed, for instance, by You are not allowed to view links. Register or Login to view..
On the contrary, words that appear immediately before a mid-line break are shorter than the average. Line final words and words following the image-break have a normal length.


The following is the histogram for specific word-lengths. Last words in lines have more frequent 1 and 2 length words: maybe because they can be more easily squeezed at the end of a line. This tendency is much stronger for words before an image break. It is possible that words are sometimes split around the image, but words after the image break do not show any particular word-length pattern.
   

This graph shows frequencies for the most common word-initial characters in the different positions. s[^h] stands for s-not(h), i.e. it excludes the "bench" Sh which is considered separately.
The fact that p-, t-, y-, s- are more frequent at line start is another known fact, discussed for instance by Emma You are not allowed to view links. Register or Login to view..
   

The first word after an image break shares the preference for y- and s- with line-initial words, but initial gallows are almost totally absent. In addition to the gallows, also l- and q- are rare after the image break. o- is also more frequent than usual. The drop in the frequency of q- after an image break and the (symmetrical?) increase in y-/o- are particularly noticeable and puzzling.


The graph for the word-final character shows what could be the best known LAAFU effect: the high frequency of -g and -m at line end. The two characters are not particularly frequent before a mid-line break. On the other hand, -d and -s are twice more frequent before a mid-line break than in the other positions; -y is also more frequent than expected.
   


I looked into the specific case of -s before an image break. It turns out that almost half of the occurrences are due to the word 's' itself: the word occurs before 0.6% of regular (i.e. space) word breaks and before 3.2% of image breaks, more than 5 times as frequently. Nothing similar happens for 'd' and 'y', the other two characters that are more common immediately before an image break: they only rarely occur in that position as stand-alone words.

In a few cases, there are multiple occurrences of 's' immediately before an image on a single page:
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

's' is the only character to appear twice isolated by two image breaks.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

It seems possible that this could be described as a preference for detaching an initial s- from the rest of the word.


RE: Lines interrupted by drawings - Davidsch - 25-09-2019

What are your line reading conclusions?

I am avoiding the term LAAFU.  My personal conclusion based on the text code was, that not all images and text overlaps can be handled the same. Then:

in the herbal section: all text flow can be regarded as intended to read from left to right in a straight line, ignoring the interrupting image.

in the other sections, the labels as defined are not labels but part of a flowing text, from left to right, from top to bottom.


RE: Lines interrupted by drawings - MarcoP - 25-09-2019

Hi David,
I have not spent much time on this aspect of the text yet, so I am quite uncertain about the meaning of the above observations. My provisional opinion is that the phenomenon discussed here is well distinct from labels, which possibly is in agreement with your statement that "not all images and text overlaps can be handled the same".

An obvious hypothesis is that what happens in readable manuscripts also happens here: an image break can be coincident with a normal space (I believe this could be the case in You are not allowed to view links. Register or Login to view.) or it can split a word into two parts (this could explain the frequent stand-alone s). A (minor?) problem with the second option is that it does not seem to be confirmed by the average length of words after the image-break: they appear to be whole words. 

A high rate of -m and -g before the break would have supported the idea that these are abbreviations (a pet theory of mine), but I could not observe any such behaviour. In principle, it could also be that words before the image break are sometimes truncated.

I think that some of this speculation could be supported/ rejected by further analysis.


RE: Lines interrupted by drawings - Koen G - 25-09-2019

Thank you Marco, this is the kind of research I like to read. I didn't react sooner because I wanted to be able to take my time with it.

When a word encounters an image and the text wants to continue on the other side, there are a few options:

  1. the word is split
  2. the final complete word before the image is squeezed in (or the scribe takes into account his spacing to make it fit well)
  3. the final word before an image is truncated

There are probably more options, but some of those are less likely. For example that an image splits text into various columns like in a newspaper.

You say that words before an image are shorter, while words after an image are the same. This rules out (1), because in that case words after the image would also be shorter. 

(3) seems to be the most logical candidate, but maybe we cannot completely rule out (2): there may be a statistically higher chance that a short word can still be added rather than a long word fitting nicely before the image.

If (3) is the case, then this might mean that the more frequent glyphs before mid-line break indicate truncation.

Edit: I now see that you say similar things in your reply to David  Blush


RE: Lines interrupted by drawings - bi3mw - 25-09-2019

It is noticeable that the words in the gaps of the image parts always fit very well. It is never written to the image ( see eg You are not allowed to view links. Register or Login to view. ). Even with a good layout, this may not always work. One can speculate why this is so. It could also have been inserted "filler words" that always fit.


RE: Lines interrupted by drawings - -JKP- - 26-09-2019

I don't know about the plant folios, bi3, but I think it would be extremely difficult for someone to draw the complex interrelationships that exist in the "map" folio unless they had sketched out (and planned) it carefully beforehand. It sometimes takes 2 or 3 tries for an image with so many textures and so much non-overlapping text-in-circles to work out right.

Whether this occurred with the plants, I don't know, it seems like it would be a lot of extra work and less likely, but the whole manuscript feels to me like it was carefully planned. So, perhaps the text was written out on a wax tablet first and then copied (or copied from another source, but it seems unlikely that the same plant-drawing shapes would have existed in an earlier manuscript).


Or, I guess a further possibility, if the text is generated from some prescribed method, is that the writer chose something shorter or longer to fit whatever the situation demanded.


RE: Lines interrupted by drawings - RenegadeHealer - 26-09-2019

Marco, was the raw data for this study one of the ingredients for the project you and Emma published earlier this year? I'm still in the brainstorming phase for my vord-generating dice model, but I'll definitely be citing this paper when I publish the final results, because it's been a helpful resource for understanding how vords seem to be formed.

I was immediately reminded of You are not allowed to view links. Register or Login to view. that the different plant-delineated text columns on You are not allowed to view links. Register or Login to view. show different ink density, suggesting that they were written separately, and thus are true columns functionally and methodologically. Your data here supports the idea that text before and after a plant drawing constitutes two separate lines, not one interrupted line. As someone who has looked at a lot of old manuscripts, how precedented was this style of column composition in medieval manuscripts of the time?



A long hard look at your histograms (with Thievery Corporation's "Langue Symbolique" playing in the background) made me notice something: vords in all positions but line-first can start with [ch, sh, and a]. On the other hand, [p, t, y, and s] tend to only begin a vord in the line-first position. Emma May Smith makes a convincing case on her blog for [y] and [a] being equivalent, with the difference being positional. Among other things, a vord that would begin with [a] will instead begin with [y] if the first vord of a line. I wonder if a similar line-head transformation rule governs vords that begin with [ch] or [sh], such that (for example), a vord beginning with [ch] needs a [p] attached to the beginning, becoming either [cph] or [pch], while [sh] needs [t] attached to it, or something like that.

I also wonder if [p, t, y, and s] are, in many cases, not really part of the vord they seem to be attached to at all, but are really just index markers with no space separating them from the line they mark. I've read the discussion about You are not allowed to view links. Register or Login to view. and its strange separated vertical column of glyphs. Since then, I've noticed that scanning down nearly any Currier B folio's left edge, I often see the same 2-3 glyphs many times over, almost in a repeating pattern. Acrostics are another thing that come to mind; a lot of ancient and medieval Hebrew writing makes clever selection of line-start characters so as to serve as both a line-indexing system and a hidden acrostic.


RE: Lines interrupted by drawings - Davidsch - 26-09-2019

Thank you Marco.

There are several approaches, and what you did here covers a fraction of what I researched, and it's still fun to exchange thoughts.

Yes I agree that the S is always a stand-alone character (free floating symbol)
The -g symbol, does not seem to be a real char. to me (based on form & context), so that symbol that not exist in my book.
The -m is indeed also an abbrev. which can be easily proven based on the context.

I suspect the [am] is the highest form of abbreviation. A lower form could be stand for [ail] or [aiin], [ain] which both are related to each other.
It seems to me that the word in a sentence morphs towards a longer form sometimes, and then suddenly collapses into a more abbreviated form.


RE: Lines interrupted by drawings - RenegadeHealer - 26-09-2019

(26-09-2019, 04:03 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.Yes I agree that the S is always a stand-alone character (free floating symbol)
The -g symbol, does not seem to be a real char. to me (based on form & context), so that symbol that not exist in my book.
The -m is indeed also an abbrev. which can be easily proven based on the context.

I suspect the [am] is the highest form of abbreviation. A lower form could be stand for [ail] or [aiin], [ain] which both are related to each other.
It seems to me that the word in a sentence morphs towards a longer form sometimes, and then suddenly collapses into a more abbreviated form.

I'm inclined to agree about [m] and [g]. I've resisted concluding these characters are abbreviations for other character strings, because I don't want to take their close resemblance to Latin scribal abbreviations at face value. But given their strong preference for the ends of lines, it's hard to avoid the conclusion that they're some kind of line-end forms of other characters. Given the observation of LAAFU, abbreviations for avoiding going onto a new line make a lot of sense.

I hadn't realized [s] nearly always stands alone as a complete vord, and want to explore this further. Do you have a link to your research where you talk more about the apparent properties of [s] in the VMS?

Many people have mentioned that [s] seems to be composed of [c] or [e] with a hook modifier, in much the same way that [r] can be seen as {i} with a hook modifier. I'm open to the possibility that [s] is basically [e] used as a one-character word, as some kind of grammatical particle, but required by Voynichese orthographic convention to carry a hook modifier whenever this character stands alone as a word. I'm reminded of the way modern Italian distinguishes the word a from ha, and the word e from è, to make reading easier and to avoid ambiguity. Or, for that matter, the way English always capitalizes the words I and O, for similar reasons.


RE: Lines interrupted by drawings - Emma May Smith - 26-09-2019

The occurrence of [s] alone before a mid-break are interesting in relation to line start patterns. We see an increase at the line start of words beginning [s] which is often followed by [a] or [o]. It would be good to learn is these lone [s] matched up with [o] or [a] on the other side.

If so (and I do not know if it is) it would make an stronger link to line start patterns.

Likewise, it would be interesting to know the second glyph in words immediately after mid-breaks which start with [y]. Both [yk] and [yt] are line start patterns.