![]() |
|
Summary of Voynich Day presentation on Line Patterns - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Summary of Voynich Day presentation on Line Patterns (/thread-4343.html) Pages:
1
2
|
RE: Summary of Voynich Day presentation on Line Patterns - nablator - 14-08-2024 (08-08-2024, 12:11 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The Vertical Impact Behaviour is definitively new. I've never seen nor heard anybody discuss this before. I remember clearly reading a thread here on VN (2-3 years ago) about the preferred sequence of first glyphs of lines in paragraphs... Not always exactly the same order as in an acrostic but clearly not random. I can't find the thread. Does anyone remember it? The author might have been Wladimir D. No success on search of thread names with words like sequence, order, first. I guess the thread was deleted.
RE: Summary of Voynich Day presentation on Line Patterns - tavie - 14-08-2024 You are not allowed to view links. Register or Login to view., by Anton? RE: Summary of Voynich Day presentation on Line Patterns - nablator - 14-08-2024 (14-08-2024, 08:30 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view., by Anton? Yes, thanks! I did not recognize the start of the thread and forgot the k-initial paragraphs requirement.
RE: Summary of Voynich Day presentation on Line Patterns - Jorge_Stolfi - 17-09-2025 (08-08-2024, 01:41 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.So for example, ch is 24% of middle initials in Herbal A. Ceteris paribus, we'd expect to see it be 24% of LS initials, which would be about 300 ch. We see only 64, so there are over 200 "missing" instances. This makes initial ch "averse" to Line Start. I have the impression (based only only recollection as a transcriber) that uncertain spaces are especially common before Ch. If those are counted as spaces, that could explain the abundance of Ch as word initial outside of line start. Unfortunately all transcriptions handle ambiguous spaces inconsistently. They may be marked as "." or ",", or just omitted. Since inter-word and inter-character spaces are highly variable, even along the same line, the marking is inherently subjective. Then there is also the bias of the line-breaking algorithm that makes longer words more common in line-initial position. If Ch is a more common start for short words, that could be part of the reason why Ch is less common in line-initial position than elsewhere. This last theory could be tested by tabulating the frequency of word-inital Ch as a function of word length. Preferably excluding parag head lines and line-initial words. All the best, --jorge RE: Summary of Voynich Day presentation on Line Patterns - MarcoP - 17-09-2025 We know from Patrick Feaster’s research (e.g. his 2022 Malta paper You are not allowed to view links. Register or Login to view.) that ch- words are relatively rare both at line start at line end. Their frequency peaks immediately after the first word in a line (positions 2 and 3) and then decreases as one moves rightward along the line. It’s very clear that ch- words are not just being squeezed at the end of lines: in that scenario, they would be particularly frequent at line end. It’s also probably worth remembering that the frequency of words containing ch (as a prefix or not) is not lower at line start than in other positions. But words starting with ych- dch- are 10 times more frequent at line start than elsewhere (see You are not allowed to view links. Register or Login to view.); they account for almost half of the line-start words that contain ‘ch’. You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view. RE: Summary of Voynich Day presentation on Line Patterns - Jorge_Stolfi - 17-09-2025 (17-09-2025, 08:14 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.We know from Patrick Feaster’s research (e.g. his 2022 Malta paper You are not allowed to view links. Register or Login to view.) that ch- words are relatively rare both at line start at line end. Their frequency peaks immediately after the first word in a line (positions 2 and 3) and then decreases as one moves rightward along the line. It’s also probably worth remembering that the frequency of words containing ch (as a prefix or not) is not lower at line start than in other positions. But words starting with ych- dch- are 10 times more frequent at line start than elsewhere (see You are not allowed to view links. Register or Login to view.); they account for almost half of the line-start words that contain ‘ch’. I believe that those discrepancies could be caused by, among other things,
To illustrate point 1, suppose that 80% of the words occurrences in a language are only 2-3-letter long and start with "u", while 20% are 20-letter long and start with "a". Then the LBA effect world result in, say, only 50% of the lines starting with "u" (because, say, 50% of the line-initial words would be long and hence start with "a"), while, say, 85% of non-line-initial words would start with "u". To illustrate point 2, imagine an English herbal where each parag starts with a dry list of diseases, without any of the common "th" words, ("the", "this", "that", "then", "they", "them", "there", ...); and the list often extends into line 2. Then "th" would be under-represented in line-start position just because it would never occur at the start of line 2. To illustrate point 3, imagine that, in an English text, a blank is inserted before every "t" letter, thus splitting any word with an embedded "t". Then the frequency of "t" as word-initial would be much increased, except among line-initial words, since those blanks would have no effect there. Or, conversely, suppose that every word that starts with "e" is joined to the preceding word on the same line. That would reduce the frequency of "e" in word-initial position to zero, except among the line-initial words. One could test whether explanation 1 is viable by tabulating the length distribution of line-initial and non-line-initial words. If the two columns are significantly different, then one could tabulate the the initial letter of n-letter words as a function of n, and see whether the two tables together explain the initial-letter discrepancies. One could test explanation 2 by comparing the initial-word distributions of line 2 in parags with a full-length head line with that of line 2 in parags where the head line is much shorter because of intruding plants. Any formula effects should be more likely to spill into line 2 in the second case than in the first case. One could perhaps test for explanation 3 by taking lines with one or more commas, and doing the statistics with commas deleted and with commas treated as word spaces. Or maybe by comparing the frequencies of "X.Y" "XY" for the relevant glyphs X and Y. Any better ideas? All the best, --jorge RE: Summary of Voynich Day presentation on Line Patterns - Jorge_Stolfi - 05-11-2025 [quote="tavie" pid='60856' dateline='1723077705'] Thanks Tavie! But that is a lot of statistics and questions to tackle all at once. Maybe we can consider just one topic -- like the distribution of Ch in Herbal-A -- and see whether we can understand that anomaly better? For starters, we must take into account that the natural line-breaking algorithm causes the first word of each line to be longer than average, while the last 1-3 words tend to be shorter than average. The most common words of any language tend to be short, and the frequency of a character is dominated by its occurrence in the most common words. (For instance, "t" and "e" are very common in English partly because they occur in "the", "to", "it", "at", "be", "me", "he", etc.) Therefore, this side effect of natural line breaking can have an effect on the frequencies of line-initial letters. So we should start by asking how much of the line-initial anomalies, specifically the frequency of Ch as line initial, could be explained this way. The first step would be to compute the relative frequency of Ch as token-initial in Herbal-A, but separately for the tokens of each length L = 1, 2, 3, ... (Here and elsewhere the length is the ideal approximate geometric width of the written word, counting o as width 1; so that Ch and t have width 2, and aiin has width 3, etc -- even if these combinations are considered single letters for statistics purposes.) Obviously the frequency will be zero for L= 1, but also for L = 2, since Ch is rare as a word by itself. I wonder what the plot will look like for other values of L. If the frequency of word-initial Ch is significantly different depending on L, then we can proceed with the next test: simulate the Scribe with different page widths, and check whether the frequency of line-initial Ch, still occurs in this "re-broken" text. Namely, take each parag of the section under study, discard its head line, and join the other lines into a single line. Then run the simple line-breaking algorithm on that string of words, with various values of the max line length parameter W. Again, the simple line breaking algorithm is: keep adding new words to the current line; until the (geometric) width of the line, plus 1 (for the space), plus the width of the next word exceeds W. Then break the line before that word. (An experienced scribe would look ahead a few words, and adjust the spacing and width of letters so that the broken line ends precisely on the right rail. But this sophistication is not necessary for this test.) With any W, the average length Li of the line-initial words should be greater than the global average word length L; while the average length of the line-final words Lf should be less than Li. And the latter effect may even be significant for the next-to-last word of each line. I predict that the frequency of Ch as line-initial will be anomalous for any reasonable value of W. The question is whether this cause alone would account for the anomaly observed in the VMS. All the best, --stolfi |