![]() |
|
[Blog Post] The 490, and other starting character patterns - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: News (https://www.voynich.ninja/forum-25.html) +--- Thread: [Blog Post] The 490, and other starting character patterns (/thread-4939.html) |
RE: The 490, and other starting character patterns - Jorge_Stolfi - 29-09-2025 (29-09-2025, 10:50 PM)SherriMM Wrote: You are not allowed to view links. Register or Login to view.Also my statistics only include the 18 line-initial characters, not any amount of random letters. Of the 18 characters, should I compute based on frequency? If you assume that those 18 characters are equally likely then P(q) = P(o) = P(y) = 1/18 = ~0.056. Then, if each line-initial letter is just chosen at random, independently, in 1000 lines you would expect 998*(1/18)^3 = 0.18 occurrences of qoy; that is none, or maybe one or two. But suppose that (say) P(q) = P(o) = P(y) = 0.30, with all the other 15 letters occurring only on 10% of the lines. Then in 1000 lines, with each initial being chosen at random and independently, you should expect 998*(0.30)^3 = ~27 occurrences of qoy. Thus, in order to tell whether the number of qoy occurences is anomalous, you must consider the actual frequencies of the letters in line-initial positions. And, again, note that some three-glyph sequences will be more common that what the formula says. For instance, in the second example above, the string qoy may occur only 25 times, but qyo may occur 31 times, oqy 26 times, oyq 33 times... If you pick the the most common three-letter pattern, that pattern will be more common than expected. All the best, --jorge RE: The 490, and other starting character patterns - RobGea - 30-09-2025 Some numbers from RF1b-er.txt RE: The 490, and other starting character patterns - Jorge_Stolfi - 06-10-2025 (30-09-2025, 12:25 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view. OK, so (n-2)P(q)P(o)P(y) = 4128 * 0.129 * 0.132 * 0.148 = ~10.4. Meaning that, if the line-initial letters had been chosen randomly and independently with those frequencies, in those 4130 lines you should expect to get 10 qoy or thereabouts. Also about a dozen dsy, a dozen sdy, a dozen yds, ten qyd, ... All the best, --jorge RE: The 490, and other starting character patterns - RadioFM - 06-10-2025 If intentionally interspersing by not repeating previous character, expected number of qoy occurrences ~13.6 RE: The 490, and other starting character patterns - Jorge_Stolfi - 07-10-2025 (06-10-2025, 05:43 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.if the line-initial letters had been chosen randomly and independently with those frequencies Two random variables X and Y are independent, by definition, if Prob(X=x and Y=y) = Prob(X=x)Prob(Y=y) for any possible values x,y If each letter S[k] in the sequence is independent from all the others, in particular it must be independent from the previous one S[k-1]. You can test this condition by computing the frequency P(xy) of S[k-1]S[k] = xy and comparing it with the product of frequencies P(x)P(y). For example, since P(q) = 0.1286 and P(o) = 0.1317, the frequency P(qo) of qo in consecutive positions should be close to 0.1286*0.1317 = 0.0169. That is, P(qo)/(P(q)P(o)) should be close to 1. However the frequencies are only an approximation to the probabilities, and the error is large when the letters and letter pairs occur only a few times in the sample. So this test should be applied only to the most common letters, like the top five in your list. Also, this test only checks whether each letter is independent from the previous one. If the line-initial letters pass this test, there may still be more complicated dependencies. Like "S[k] is equal to S[k-3] 70% of the time, unless S[k-7] is q, in which case ..." Also, if one computes a large number of statistics about something, some of the statistics will be "anomalous" just by chance. Like, if you ask 1024 financial gurus in the morning whether a certain stock will go up or down during the day, for 10 consecutive days, there is a good chance that one of them will be correct all 10 times. It does not follow that her predictions are better than the others. Or better than flipping a coin... All the best, --jorge RE: The 490, and other starting character patterns - Magical Raven - 07-10-2025 Excuse the intrusion, I'm a programmer (a very bad one), I want to share my perspective if you don't mind.Is it a feasible theory that each paragraph is a block composed of lines that are in turn self-sufficient?Each line would represent a different but sequential process, meaning you could only complete the second line once the first line of the block was done and executed. Thanks to the guy who started this thread, I was able to see the patterns at both the beginning and end of the lines.That would also perhaps explain the text's weak narrative structure, but it does have a certain component of logical processes... Like A-> B-> C Here are some prefixes and labels that end on each line and are constantly repeated. Prefixes qok–, ot–, d–, y– Which usually start the supposed instructions Suffixes or tags ending sequences -dy, -ain, -al, -ol In short, each block or paragraph is composed of different processes that begin and end on a single line. But this only applies to the botany section; I haven't explored the rest of the manuscript yet. RE: The 490, and other starting character patterns - dashstofsk - 08-10-2025 Thanks SherriMM, I think you have raised something of genuine significance. But perhaps the way to analyse this further is not to look at 3 or higher character string repeats but to look at the distribution of 2-character repeats. For instance, in the starting characters for You are not allowed to view links. Register or Login to view. ( using the GC transliteration ) a q line is followed 7 times by one of the ch variants, and there appears to be a pattern. This is curious. In f83r s lines seem to appear too often. Curious also. Taking only paragraph text from the GC transliteration and not including lines spanning page breaks, the matrix of occurrences for the character pairs is given here. ( counts ) For instance, there are 80 lines starting d that are followed by a line staring y. But if you look at the matrix of affinities, the ratios of counts against expected values, then you get something rather unexpected. d followed by d occur 50 times, but this is 0.52 of what would be expected if the lines were randomly shuffled. But also look at o-o ( 0.25 ), q-q ( 0.3 ), t-t ( 0.59 ). These doubles all occur less often than expected. Yet if you look at s-s ( 1.42 ), it is high. s seems to have a liking for itself more than for other characters. ( affinities ) Also look at some of the highs. q-Sh and q-ch occur 2.18 and 3.75 times what would be expected. o-q occurs 134 times but is 2.01 times what would be expected. Also look at some of the lows. s-ch hardly ever occurs. These are big swings from parity. Applying statistics hypothesis testing methods to this data to obtain a confidence level in the hypothesis that the effects are not random would not be necessary. The swings from parity are just too big. So, how to explain these anomalies. If the manuscript were in some natural language then it would be expected that the narrative would just wrap to the next line if there was no more space for a word at the end of the previous line. Line starting characters would be independent of each other and there would be reasonable parity between observed and expected repeats. Likewise with any cypher hypothesis that transforms words according to some algorithm. So it just remains that these anomalies are showing evidence of human choice and selection, that the text of the VMS was artificially constructed and that the writers had some private method to generate text, and meaningless, had their favourite character strings, added sufficient variability in order to give the constructed text a semblance of genuineness, but with the exception of s did not want to repeat line first characters too often. RE: The 490, and other starting character patterns - Jorge_Stolfi - 09-10-2025 (08-10-2025, 11:49 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I think you have raised something of genuine significance. But perhaps the way to analyse this further is not to look at 3 or higher character string repeats but to look at the distribution of 2-character repeats. Even before that, one should compare the frequencies P(w) of line-initial words with the frequencies Q(w) of words in any position. They are very different. The statistics of line-initial glyphs is highly skewed because the statistics of line-initial words are skewed. Thus, one should try to understand and explain the latter rather that the former. Suppose that on some other manuscript one finds that the letters "i", "x" and "v" seem to be very common at the beginning of the line, and there is an anomalously high frequency of repeated letters in consecutive lines, much higher than expected if they were chosen independently at random. Those anomalies might be easier to understand that if one notices that the most common words at the beginning of the line are "i", "ii", "iii", "iv", "v", "vi", ... Let's look at page f83r, in particular. Say that a token (word occurrence) is "head" if it is the first of a line, "body" otherwise. The following lists exclude words that occur only once, which we cannot tell whether they like to be head or body. The counts are nt = total occurrences on page, nh = occurrences as head, nb = occurrences in body. (Sorry for the leading zeros, but it was the only way I found to prevent the MyBB editor from messing up the alignment of the tables). These words seems to occur as both head and body: nt nh nb word -- -- -- --------- 05 03 02 saiin 04 02 02 daiin 03 01 02 dain 03 01 02 qokchedy 03 01 02 sar 02 01 01 qokain 02 01 01 qokshedy 02 01 01 sy These words occur (practically) ONLY as head: nt nh nb word -- -- -- --------- 07 06 01 sol 03 03 00 tchedy 02 02 00 solkeedy 02 02 00 sor These words occur (practically) ONLY as body: Code: nt nh nb wordAnd these are the words that occur only once on the page: Code: nt nh nb wordRE: The 490, and other starting character patterns - dashstofsk - 09-10-2025 I have just recalled that user 'tavie' posted something similar about line starting characters. He used the term 'vertical pair' which sounds more suitable. In particular, he looked at the frequency of vertical pairs within different sections of the manuscript. You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view. To this I would just like to add my matrix of affinities for vertical pairs in the language A pages ( Language A affinities ) In particular you will see that o-o , q-q and t-t vertical pairs occur in language A even less often than would be expected. RE: The 490, and other starting character patterns - ReneZ - 09-10-2025 (09-10-2025, 09:39 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I have just recalled that user 'tavie' posted something similar about line starting characters. He used the term 'vertical pair' which sounds more suitable. In particular, he looked at the frequency of vertical pairs within different sections of the manuscript. She has made presentations at the Voynich Conference in 2022 and both Voynich MS days in 2024 and 2025. Especially the latter two are on this topic, and I recommend having a close look at them. |