I have added two more sets of words: the set of all single-word labels, and the same set but excluding zodiac labels. The two resulted to be quite similar, so I will only discuss "labels" in general. For these sets, I have considered the whole manuscript (while other sets only include pages with at least an image-break).
Like line-breaks, label statistics turn out to be somehow comparable with image-breaks.
[
attachment=3587]
For suffixes, line-breaks were a good match, with the exception of -am that is frequent at line breaks but rare at image-breaks. In labels, -am is much less frequent than at line-breaks; all the other values are reasonably close to those for image-breaks (though those for line-breaks are even better).
[
attachment=3586]
For prefixes, labels feature high frequencies of o- in all oGallows- forms. But, in the comparison with image-breaks, the values for o- are compensated by that for qo-. Image-breaks and labels are the only sets in which qo- is below 5%.
It should be noted that, even if both line-breaks and labels compare well with image-break values, line-breaks and labels behave quite differently: their similarity and difference with image-breaks are complementary.
I did not look for more than several seconds, but it seems you took every text and made no distinction between a line and a paragraph and a label?
Of course, then there's nothing special, because every piece of text behaves around the same averages.
We recently completed a rigorous analysis of this phenomenon and generated several tables cataloging the propensity of various word tokens to in relation to various positions, including adjacency to drawings.
There is a separate thread on this: You are not allowed to view links.
Register or
Login to view.
(Unfortunately, in my searching for related work, I did not find this current thread or Marco's work; I could have mentioned it in relations to previous research.)
(29-04-2024, 03:11 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.We recently completed a rigorous analysis of this phenomenon and generated several tables cataloging the propensity of various word tokens to in relation to various positions, including adjacency to drawings.
With thanks to asteckley and MarcoP: some thoughts on "words" adjacent to plant drawings in the "herbal" (Scribe 1) section:
You are not allowed to view links.
Register or
Login to view.
(02-05-2024, 08:59 AM)dfs346 Wrote: You are not allowed to view links. Register or Login to view. (29-04-2024, 03:11 PM)asteckley Wrote: You are not allowed to view links. Register or Login to view.We recently completed a rigorous analysis of this phenomenon and generated several tables cataloging the propensity of various word tokens to in relation to various positions, including adjacency to drawings.
With thanks to asteckley and MarcoP: some thoughts on "words" adjacent to plant drawings in the "herbal" (Scribe 1) section:
You are not allowed to view links. Register or Login to view.
In addition to the "daiin" vs "8am" problem, these "leaf words" could also cause problems in the data when counting things.
"In the "herbal" section as a whole, the incidence of hapax legomena is 71.4 percent of the vocabulary. Among the "leaf words", the incidence of hapax legomena is just 34.7 percent. In some way, the "leaf words" are more likely to be meaningful than the "herbal" section as a whole."
Could this be an artifact of the truncation and the creation of words with fewer characters?
If a scribe needs to write the word "today" but only has room for "to" and then "day". This gives a count of two meaningful words but the scribe only intended one.
(02-05-2024, 05:02 PM)pjburkshire Wrote: You are not allowed to view links. Register or Login to view.If a scribe needs to write the word "today" but only has room for "to" and then "day". This gives a count of two meaningful words but the scribe only intended one.
I just posted a table of data regarding the splitting of words across drawings in a different thread, but I just realized it is relevant to this discussion (perhaps more so than to the other thread).
I'll refer to it there, rather than repeat it all-- the comments and table are here:
You are not allowed to view links.
Register or
Login to view.
Further thoughts on "leaf words" in the Voynich manuscript; and on whether they are real words, or junk. This analysis refers to all "leaf words" in the manuscript, and is based on data kindly provided by Dr Andrew Steckley.
You are not allowed to view links.
Register or
Login to view.
[
attachment=8855]
The top ten left “leaf words”; the top ten right “leaf words”; and their frequencies in the Voynich manuscript as a whole. Highlighted frequencies denote cases where the “word” is much more common as a “leaf word” than in the manuscript as a whole. Author’s analysis, based on data kindly provided by Dr Andrew Steckley.