08-08-2024, 12:55 PM
Following up on one facet of Voynich Manuscript Day:
The main thing I think our three presentations shared in common was the goal of searching for structural patterns outside words as earnestly as people have long been searching for structural patterns inside them. We each investigated cases where the actual prevalence of a text element (glyph, word, etc.) turns out to be significantly greater or lesser in a particular context than it "should" be in a random distribution.
[attachment=8980]
As I mentioned in Q&A, I've also tried studying higher-order transitional probabilities, such as third-order [dyk>a], which is the same glyph sequence as Emma's [dy.ka], but again, approached from a different perspective and, in that case, broken up differently. Even higher-order transitional probabilities, such as [ydaii>n] or [qokeedy>d], would similarly overlap with some of Emma's start-start and end-end pairings, especially when populated by wild-card characters as in [y***>n] or [q*****>d], except that they'd be defined by an intervening glyph count (with all the attendant uncertainty over what counts as one glyph) rather than by positions within words (with all the attendant uncertainty over word breaks). I sense potential weaknesses in both approaches, but I'm not sure how to get around them.
I assume there's got to be some connection between this type of glyph-by-glyph or word-by-word pattern and the other type of pattern described by tavie, centered on differences by line and paragraph position. After all, for those two types of pattern both to be valid, they must overlap and complement each other, and I even showed a few examples of transitional probabilities that vary strongly by location within lines and paragraphs. But how these two types of patterns interrelate with each other strikes me as still very much a mystery. Do they just coexist? Or do they both result from some other common factor? Or does one level of patterning somehow cause the other level of patterning?
In Q&A, I briefly suggested that the cumulative effect of glyph-by-glyph or word-by-word patterns might be responsible for some kinds of line pattern. If individual glyph-by-glyph or word-by-word patterns of preference are asymmetrical, tending towards certain combinations and away from others, that could perhaps account for some features being unevenly distributed within lines. But I've never been able to demonstrate any such thing statistically, so right now it's no more than an idle guess on my part. Another apparent tendency of certain glyphs to recur preferentially after an interval might account for the greater probability of [p] further along in lines that begin with [p], but it wouldn't offer any insight into the reasons for paragraph-initial [p] itself. (In Emma's analysis, I believe that relationship would translate into a start-anywhere combination.)
Or it could be the other way around. Different paragraph and line positions might cause glyph-by-glyph or word-by-word patterns to vary.
It seems there must be some relationship, but for now, I really don't know what it is, and I'm unsure how to go about trying to find out.
(05-08-2024, 08:13 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.All research was well presented, for example Emma explained statistical concepts well.... But as the data was being presented in the text-focused talks, I felt myself losing the forest through the trees, wondering how things fit into the bigger picture.... I think it would be extremely valuable to the community if someone was able to write an "explain like I'm five" version of these talks, trying to focus on the bigger picture and how Emma's, tavie's and Patrick's findings relate to each other. This might be an assignment even the authors themselves struggle with, but it would be an invaluable exercise.I'm sure I won't be able to do justice to this assignment on my own, but it's interesting enough that I didn't want to leave it unaddressed.
The main thing I think our three presentations shared in common was the goal of searching for structural patterns outside words as earnestly as people have long been searching for structural patterns inside them. We each investigated cases where the actual prevalence of a text element (glyph, word, etc.) turns out to be significantly greater or lesser in a particular context than it "should" be in a random distribution.
- In tavie's presentation, we saw that there are many patterns specific to line starts, line ends, and top rows -- more patterns, and stronger ones, than past casual assessments have suggested. Her detection of You are not allowed to view links. Register or Login to view. felt especially new and exciting.
- In Emma's presentation, we saw that there are many patterns in which features of words pair preferentially or dispreferentially with features of adjacent words -- not just word-break combinations (linking the end of one word to the start of the next word), but also pairs of successive beginnings and endings, or words that simply contain particular features anywhere within them. Her introduction of z-scores brings valuable statistical rigor to this type of study.
- In my presentation, I tried to show that treating glyph sequences within words and between words as parts of the same system, rather than analyzing these separately, lets us identify some interesting cyclical patterns that don't necessarily coincide with words as units but do fairly well at "predicting" longer repeating sequences. Here are the slides and script in case anybody wants them:[attachment=8981][attachment=8982]
[attachment=8980]
As I mentioned in Q&A, I've also tried studying higher-order transitional probabilities, such as third-order [dyk>a], which is the same glyph sequence as Emma's [dy.ka], but again, approached from a different perspective and, in that case, broken up differently. Even higher-order transitional probabilities, such as [ydaii>n] or [qokeedy>d], would similarly overlap with some of Emma's start-start and end-end pairings, especially when populated by wild-card characters as in [y***>n] or [q*****>d], except that they'd be defined by an intervening glyph count (with all the attendant uncertainty over what counts as one glyph) rather than by positions within words (with all the attendant uncertainty over word breaks). I sense potential weaknesses in both approaches, but I'm not sure how to get around them.
I assume there's got to be some connection between this type of glyph-by-glyph or word-by-word pattern and the other type of pattern described by tavie, centered on differences by line and paragraph position. After all, for those two types of pattern both to be valid, they must overlap and complement each other, and I even showed a few examples of transitional probabilities that vary strongly by location within lines and paragraphs. But how these two types of patterns interrelate with each other strikes me as still very much a mystery. Do they just coexist? Or do they both result from some other common factor? Or does one level of patterning somehow cause the other level of patterning?
In Q&A, I briefly suggested that the cumulative effect of glyph-by-glyph or word-by-word patterns might be responsible for some kinds of line pattern. If individual glyph-by-glyph or word-by-word patterns of preference are asymmetrical, tending towards certain combinations and away from others, that could perhaps account for some features being unevenly distributed within lines. But I've never been able to demonstrate any such thing statistically, so right now it's no more than an idle guess on my part. Another apparent tendency of certain glyphs to recur preferentially after an interval might account for the greater probability of [p] further along in lines that begin with [p], but it wouldn't offer any insight into the reasons for paragraph-initial [p] itself. (In Emma's analysis, I believe that relationship would translate into a start-anywhere combination.)
Or it could be the other way around. Different paragraph and line positions might cause glyph-by-glyph or word-by-word patterns to vary.
It seems there must be some relationship, but for now, I really don't know what it is, and I'm unsure how to go about trying to find out.