Options

Summary of Voynich Day presentation on Line Patterns

Index
Summary of Voynich Day presentation on Line Patterns
RE: Summary of Voynich Day presentation on Line Patterns

nablator > 14-08-2024, 10:25 AM

(08-08-2024, 12:11 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.The Vertical Impact Behaviour is definitively new. I've never seen nor heard anybody discuss this before.

I remember clearly reading a thread here on VN (2-3 years ago) about the preferred sequence of first glyphs of lines in paragraphs... Not always exactly the same order as in an acrostic but clearly not random. I can't find the thread. Does anyone remember it? The author might have been Wladimir D. No success on search of thread names with words like sequence, order, first. I guess the thread was deleted.
RE: Summary of Voynich Day presentation on Line Patterns

tavie > 14-08-2024, 08:30 PM

You are not allowed to view links. Register or Login to view., by Anton?
RE: Summary of Voynich Day presentation on Line Patterns

nablator > 14-08-2024, 09:15 PM

(14-08-2024, 08:30 PM)tavie Wrote: You are not allowed to view links. Register or Login to view.You are not allowed to view links. Register or Login to view., by Anton?

Yes, thanks! I did not recognize the start of the thread and forgot the k-initial paragraphs requirement.
RE: Summary of Voynich Day presentation on Line Patterns

Jorge_Stolfi > 17-09-2025, 02:13 AM

(08-08-2024, 01:41 AM)tavie Wrote: You are not allowed to view links. Register or Login to view.So for example, ch is 24% of middle initials in Herbal A. Ceteris paribus, we'd expect to see it be 24% of LS initials, which would be about 300 ch. We see only 64, so there are over 200 "missing" instances. This makes initial ch "averse" to Line Start.

I have the impression (based only only recollection as a transcriber) that uncertain spaces are especially common before Ch. If those are counted as spaces, that could explain the abundance of Ch as word initial outside of line start. Unfortunately all transcriptions handle ambiguous spaces inconsistently. They may be marked as "." or ",", or just omitted. Since inter-word and inter-character spaces are highly variable, even along the same line, the marking is inherently subjective.

Then there is also the bias of the line-breaking algorithm that makes longer words more common in line-initial position. If Ch is a more common start for short words, that could be part of the reason why Ch is less common in line-initial position than elsewhere.

This last theory could be tested by tabulating the frequency of word-inital Ch as a function of word length. Preferably excluding parag head lines and line-initial words.

All the best, --jorge
RE: Summary of Voynich Day presentation on Line Patterns

MarcoP > 17-09-2025, 08:14 AM

We know from Patrick Feaster’s research (e.g. his 2022 Malta paper You are not allowed to view links. Register or Login to view.) that ch- words are relatively rare both at line start at line end. Their frequency peaks immediately after the first word in a line (positions 2 and 3) and then decreases as one moves rightward along the line. It’s very clear that ch- words are not just being squeezed at the end of lines: in that scenario, they would be particularly frequent at line end.

It’s also probably worth remembering that the frequency of words containing ch (as a prefix or not) is not lower at line start than in other positions. But words starting with ych- dch- are 10 times more frequent at line start than elsewhere (see You are not allowed to view links. Register or Login to view.); they account for almost half of the line-start words that contain ‘ch’.

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
RE: Summary of Voynich Day presentation on Line Patterns

Jorge_Stolfi > 17-09-2025, 02:40 PM
(17-09-2025, 08:14 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.We know from Patrick Feaster’s research (e.g. his 2022 Malta paper You are not allowed to view links. Register or Login to view.) that ch- words are relatively rare both at line start at line end. Their frequency peaks immediately after the first word in a line (positions 2 and 3) and then decreases as one moves rightward along the line. It’s also probably worth remembering that the frequency of words containing ch (as a prefix or not) is not lower at line start than in other positions. But words starting with ych- dch- are 10 times more frequent at line start than elsewhere (see You are not allowed to view links. Register or Login to view.); they account for almost half of the line-start words that contain ‘ch’.

I believe that those discrepancies could be caused by, among other things,
1. The line-breaking algorithm (LBA) effect, together with different initial-glyph distribs for long and short words.
2. Formula effects which make the word distrib at the start of line 2 different form the overall distrib.
3. Misreading of ambiguous spaces leading to splitting or joining of words away from line start.
To illustrate point 1, suppose that 80% of the words occurrences in a language are only 2-3-letter long and start with "u", while 20% are 20-letter long and start with "a". Then the LBA effect world result in, say, only 50% of the lines starting with "u" (because, say, 50% of the line-initial words would be long and hence start with "a"), while, say, 85% of non-line-initial words would start with "u".

To illustrate point 2, imagine an English herbal where each parag starts with a dry list of diseases, without any of the common "th" words, ("the", "this", "that", "then", "they", "them", "there", ...); and the list often extends into line 2. Then "th" would be under-represented in line-start position just because it would never occur at the start of line 2.

To illustrate point 3, imagine that, in an English text, a blank is inserted before every "t" letter, thus splitting any word with an embedded "t". Then the frequency of "t" as word-initial would be much increased, except among line-initial words, since those blanks would have no effect there.

Or, conversely, suppose that every word that starts with "e" is joined to the preceding word on the same line. That would reduce the frequency of "e" in word-initial position to zero, except among the line-initial words.

One could test whether explanation 1 is viable by tabulating the length distribution of line-initial and non-line-initial words. If the two columns are significantly different, then one could tabulate the the initial letter of n-letter words as a function of n, and see whether the two tables together explain the initial-letter discrepancies.

One could test explanation 2 by comparing the initial-word distributions of line 2 in parags with a full-length head line with
that of line 2 in parags where the head line is much shorter because of intruding plants. Any formula effects should be more likely to spill into line 2 in the second case than in the first case.

One could perhaps test for explanation 3 by taking lines with one or more commas, and doing the statistics with commas deleted and with commas treated as word spaces. Or maybe by comparing the frequencies of "X.Y" "XY" for the relevant glyphs X and Y.

Any better ideas?

All the best, --jorge
Next Oldest Next Newest

Summary of Voynich Day presentation on Line Patterns

Index

RE: Summary of Voynich Day presentation on Line Patterns

RE: Summary of Voynich Day presentation on Line Patterns

RE: Summary of Voynich Day presentation on Line Patterns

RE: Summary of Voynich Day presentation on Line Patterns

RE: Summary of Voynich Day presentation on Line Patterns

RE: Summary of Voynich Day presentation on Line Patterns