![]() |
|
Opinions on: line as a functional unit - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Opinions on: line as a functional unit (/thread-5021.html) |
RE: Opinions on: line as a functional unit - pfeaster - 14-11-2025 (14-11-2025, 11:26 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Going back to the topic of this thread: I am still confused about what exactly LAAFU means. I see that there has been a huge amount of discussion about it,and what I have read only left me more confused. LAAFU ("Line As A Functional Unit") refers broadly to the observation that statistical properties of Voynichese text vary based on line position -- implying in turn that the line functions somehow as a "unit." Historically it has mostly been used to refer to distinctive patterns found at the beginnings and ends of lines, probably because the differences in those positions are most readily apparent. However, Emma May Smith and Marco Ponzi have also identified statistical anomalies among second words of lines, and my own studies of "rightwardness" metrics have (I think) shown that subtler forms of line patterning permeate the whole text, with many word features consistently "preferring" earlier or later positions within a line. For example: choose any pair of Voynichese words that differ only in that one contains [a] where the other contains [o]. Considering only mid-line tokens of those two words -- excluding first and last words -- I believe you'll find that in nearly every case the word containing [a] appears further rightward on average than the word containing [o]. If the line weren't somehow fundamental to the process by which Voynichese text was composed, I can't imagine how such consistent and pervasive patterns would have arisen. RE: Opinions on: line as a functional unit - bi3mw - 14-11-2025 (14-11-2025, 11:26 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Going back to the topic of this thread I wouldn't say that the question of possible syllables is off topic. If you want to determine whether lines in the VMS are structured according to any meter (hexameter or not), then the definition of individual word segments is essential. Analyses such as those presented by @quimqu in post #71 are then a first step toward recognizing possible rhythms. Prefixes, for example, seem to be significantly longer than stems and suffixes. The goal may be to break down a few random lines (Quire 20?) in such a way that it becomes apparent whether or not there is a repeating pattern ( long / short ) after the (supposedly) recognized word segments. RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 (14-11-2025, 01:20 PM)Bluetoes101 Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view. Thanks, but that is Currier's thinking from 1976. I imagine that more has been known since then. I saw @tavie's presentation at the last Voynich day,but that was a lot of detail, and included speculation about the head lines... All the best, --stolfi RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 (14-11-2025, 01:13 PM)Kaybo Wrote: You are not allowed to view links. Register or Login to view.But for me it means, that a paragraph is not a continues text and that every line starts new. I this would be an extreme form of LAAFU, no? And anyway it is a conjecture about the cause, not a summary description of the anomalies without trying to explain them -- which is what I was looking for. All the best, -stolfi RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 (14-11-2025, 01:08 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Is it important how the split is made? For example, randomly assigning lines to A or B vs assigning the first half of the lines to A and the second to B. Either way will have the same chance of detecting spurious "anomalies" due to sampling error alone. But if the lines are split as ( first half, second half ), any anomaly that is detected in only one half could still be a real anomaly that occurs only there. Which would be an interesting discovery... All the best, --stolfi RE: Opinions on: line as a functional unit - pfeaster - 14-11-2025 (14-11-2025, 02:10 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.For example: choose any pair of Voynichese words that differ only in that one contains [a] where the other contains [o]. Considering only mid-line tokens of those two words -- excluding first and last words -- I believe you'll find that in nearly every case the word containing [a] appears further rightward on average than the word containing [o]. Revisiting my notes, I'd suggest that a more persuasive test case is [Sh] (earlier in line) versus [ch] (later in line). The [a] / [o] case *does* generally work when considering all line positions but is a bit weaker for mid-line only. RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 Let me also remind people of the recent finding that the line-breaking algorithm used by scribes (not just on the VMS, but in any language and any epoch, even today) has the side effect of making the first word of each line longer than average, and the last few words shorter than average. This phenomenon alone can have a significant effect on word frequencies at the start of the line, because the most frequent words in a text tend to be short. For instance, if a running English text is broken into lines in the simplest possible way, we can expect that the words "the", "and", "is", "it" will occur more rarely at the start of lines than in the text as a whole. Conversely they will occur more frequently at the end of lines. That is why it is important to run all statistical analyses on a control (non-VMS) sample whenever possible. The anomalies at line-start may be real, but may be due to this and other causes -- not to any semantic role of line breaks. For example, here is a quick test using the Portuguese novel that I sent to @Quimqu a while ago: Code: # first nonfirstThese frequencies were obtained by feeding the running text, where each paragraph was formatted as a single line, through a trivial line breaking program (Linux "fmt --split-lines") that broke each parag into lines of 72 chars max. Then taking each line with 10 words of more (there were ~4800 of them) and separating the first word from the other words, and computing the frequencies of both. Note that the the frequencies of short words like "a", "o" are systematically higher in the second column than in the first. And I did not bother to exclude the parag head lines; this may explain why "que" (= "what", "which", "whose", "who", "that"...) is still about equally common in both sets, since it occurs often at the start of interrogative sentences, and hence at the start of parags. This bias towards longer words can also affect the statistics of line-initial characters, since character frequencies are determined largely by their occurrence in high-frequency words. All the best, --stolfi RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 (14-11-2025, 02:45 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.For example: choose any pair of Voynichese words that differ only in that one contains [a] where the other contains [o]. Considering only mid-line tokens of those two words -- excluding first and last words -- I believe you'll find that in nearly every case the word containing [a] appears further rightward on average than the word containing [o]. That is interesting, because those scores should not be affected by the length bias due to line-breaking. (Unless the Scribe unconsciously felt that a word with Sh was longer than the version with Ch...) But does that table include the parag head lines? Does it include labels and titles? Is the effect the same in all sections? All the best, --stolfi RE: Opinions on: line as a functional unit - Jorge_Stolfi - 14-11-2025 More about the size bias of line-breaking: The simplest line-breaking algorithm for running text is: keep writing words on the current line, until you get to a word that, if written, would run past the right rail. In that case, break line before that word, and continue as before on the next line. It should be easy to see why that algorithm makes a line break much more likely before a long word than a short one. To estimate the effect precisely one could use a word Markov or order 2 of 3, pipe the output though that line-breaking algorithm, and run the desired statistics. Or take the VMS text, join the lines of each parag into a single line, and pipe it through that line-breaking algorithm with a very different line width. But that basic algorithm could be elaborated in a number of ways. First, the scribe choose to could split words in order to save vellum. In that case, it may leave some indication of the split (m?) at the end of the first line, or at the beginning the second one (y? q?) or both, or neither. Even if he leaves no mark, the statistics of the line-initial words will be different because they will include word suffixes in addition to whole words. And if he splits the word only between syllables, as we do today, both parts will be morphologically similar to whole words. In another independent elaboration, when the scribe gets near the end of the line, he will look ahead 2-3 words, choose the line break, then stretch or shrink the writing so that the line will end on the right rail. (Modern word processors will stretch or shrink the spaces, but a Medieval scribe also stretch or shrink the characters themselves.) As a result, words near the end of the line are more likely to be improperly split or joined in the transcription files. And a scribe could use abbreviations when necessary to squeeze one more word before the line break. The VMS Scribe probably could not read the text, but the Author may have told him that he could abbreviate aiin or aiiin as am if he needed to. Also, as in many manuscripts of the time, the scribe may have placed some special mark (y? q?) at the start of a line whenever there was a sentence of sub-paragraph break anywhere within that line. And here is another rather far-fetched idea. Suppose the language was tonal (not necessarily Asian; I gather that Swedish is tonal too, for instance). That is, the pitch pattern along a word would change its meaning. One way to record tones in such a language is to insert special symbols -- like digits, or a/o/y -- in the words to indicate the pitch level. Thus, for example, the Mandarin word "lǎo", with the "dipping" tone, could be written as "l2a1o3" (This system still used by linguists to discuss tone systems in a language-independent way.) Then, within the same line, one could save some ink by inserting those pitch codes only when there was a change of pitch. That is, instead of "b2a1o3 b3a4o4 b4a1o1" one could write just "b2a1o3 b3a4o ba1o". But after a line break the scribe may have felt necessary to insert a pitch code, even it there was no change from the previous line, for the benefit of the reader... There may be many more processes like those above that result in anomalous statistics at the start and end of lines. And several of them may be at work in the VMS. Untangling them will require more than just staring at tables of statistics. One should try instead the "scientific"method: formulate an hypothesis about a possible cause of the anomalies, then devise the simplest statistical test that could prove or disprove that hypothesis... All the best, --stolfi RE: Opinions on: line as a functional unit - ReneZ - 14-11-2025 (14-11-2025, 02:10 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.If the line weren't somehow fundamental to the process by which Voynichese text was composed, I can't imagine how such consistent and pervasive patterns would have arisen. I would agree, but if it were, I would still have a hard time to see how they could have arisen. Until a few years ago, the most unusual part of the Voynich MS text used to be the low bigram entropy and the word patterns, which are closely related to each other. I can think of ways how these could have arisen. However, by now, both the general rightward/downward issue of Patrick and the line initial character alternations of Tavi have me flustered. Currier had no idea of any of this. I would consider his Line As A Functional Unit an outdated concept. |