![]() |
|
The oddities of the bigram "ed" - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: The oddities of the bigram "ed" (/thread-5368.html) |
RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026 And, since I had only tested Latin... The Divine Comedy in Spanish Faust in German RE: The oddities of the bigram "ed" - Jorge_Stolfi - 16-02-2026 (16-02-2026, 03:55 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.I typically strip all punctuation, lower case and then split the text into 100 word groups, something close to herbal page sizes. A digram will stand out in your plots only if it occurs many times on a large fraction of the pages and zero times on another large fraction of the pages. Then the number of pages with the bigram will be anomalously low and it will stand out in the plot For example, suppose the bigram occurs 2000 times, and you have 100 pages, you would expect it to occur on almost all of those 100 pages. But if instead there are 50 pages with four occurrences each, and 50 with no occurrences, the digram will stand out just like your ed But if your "pages" are too small, even in the places where the digram is concentrated, almost all pages will have zero or one occurrences. Then the number of pages with the bigram will seem to be normal. Take that same text of the example above, but divide the text into 8000 pages instead of 100 pages. Then those 50 pages that had all the 2000 occurrences will be split into 4000 pages, and thus those occurrences will be about one per page, and you will have ~2000 pages with occurrences. Which is the expected number for 2000 evenly distributed occurrences. Then the digram would seem to be normal in the plot. Thus if a digram occurs N times overall, you should divide the text into, say, N/2 pages. Then, if the digram is evenly distributed, one expects that it will occur in (1-1/e^2) = ~86% of the pages. But if it is concentrated in some sections, the percentage of pages with the bigram will be much lower than that. All the best, --stolfi All the best, --stolfi RE: The oddities of the bigram "ed" - davidma - 16-02-2026 Very interesting. It always struck me that "ed" is the only bigram that appears by itself in the central "star" of f69r, together with y, d, o, l, s. RE: The oddities of the bigram "ed" - Skoove - 16-02-2026 (16-02-2026, 11:39 AM)davidma Wrote: You are not allowed to view links. Register or Login to view.Very interesting. It always struck me that "ed" is the only bigram that appears by itself in the central "star" of f69r, together with y, d, o, l, s. I was just about to add this! I think that this might be part of the clue. Perhaps 'ed' is not actually a bigram in some of these quires but is thought of as a single glyph similar to how 'qo' (except 'qo' shows this behaviour over the entire manuscript). This would explain why it occurs so frequently compared to other bigrams. RE: The oddities of the bigram "ed" - ReneZ - 16-02-2026 But then wat about 'eed'? RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026 (16-02-2026, 07:51 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thus if a digram occurs N times overall, you should divide the text into, say, N/2 pages. Then, if the digram is evenly distributed, one expects that it will occur in (1-1/e^2) = ~86% of the pages. But if it is concentrated in some sections, the percentage of pages with the bigram will be much lower than that. Voynich total ed occurrences: 4660/2 = 2330. That breaks the Voynich down to 14 words per page. 59% Culpepers total rk occurrences: 379/2 = 189 & 190. And that's 1321 words per page. 76% I think I got the math right. RE: The oddities of the bigram "ed" - Dunsel - 16-02-2026 (16-02-2026, 12:46 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.But then wat about 'eed'? It's influenced by ed. Same 50%ish page coverage but a lower total count. Orange dots are the top trigrams with any ed in them. RE: The oddities of the bigram "ed" - Koen G - 17-02-2026 Nice graphs! I think I can see edy in that last one. Makes one wonder what these "edy" clusters are. It would be so neat if they corresponded to "ain" clusters, but your very first graph shows nicely how that cannot be the case. There should be more outliers then. So "edy" is new. Unless the thing it corresponds to in A-pages also occurs a bit on most B-pages. Then it wouldn't register as an outlier. RE: The oddities of the bigram "ed" - Dunsel - 17-02-2026 (17-02-2026, 12:39 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Nice graphs! I think I can see edy in that last one. It's the orange dot on the far right in the trigram charts I posted earlier. (17-02-2026, 12:39 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Makes one wonder what these "edy" clusters are. It would be so neat if they corresponded to "ain" clusters, but your very first graph shows nicely how that cannot be the case. There should be more outliers then. There aren't any I've found. The chart below has the top 1000 bigrams. Almost all are low count and on a limited number of pages. (the blob in the bottom left corner) (17-02-2026, 12:39 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.So "edy" is new. Unless the thing it corresponds to in A-pages also occurs a bit on most B-pages. Then it wouldn't register as an outlier. If by definition of new, you mean came into existence in a later production stage of the book then I'll say no (evidence in the next post). edy existed once in my 0ed pages (Currier A) and in that one instance, it was in a hapax token. Here are the 6 times the bigram ed appeared in Currier A. (I didn't double check but they all should be Currier A). F8R : shesed F11R : ded F56R : esedy <--- F67V2 : taedaiin F90V2 : ofchedol F93R : sheedom Here's the top 1000 bigrams across the entire Voynich. Thee are no other outliers like ed. Edit: And you made me wonder so I had to see. Here's the trigram ain. It's on about 65% of the pages but it's count is less than 1/8 that of ed. So it blends in with the crowd. And here's iin. It's count is close to ed but, it's on just about every page (thank you daiin). RE: The oddities of the bigram "ed" - Dunsel - 17-02-2026 (16-02-2026, 12:46 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.But then wat about 'eed'? Ok, it's an ugly chart but I think it'll explain the whole trigram question everyone has brought up. There were only 6 occurrences of ed in the 0ed pages (Currier A) in hapax tokens and of those, only 3 made edX trigrams. Therefore, any edX trigram with the exception of the 3 above, will only be on about 56% of Voynich pages max (Currier B). If the edX trigram had a large enough count then, it would follow ed's behavior and be an outlier. ed is still the boss, the trigrams just went along for the ride. 0ed on the left (blue) and ed+ (orange) is on the right. Cumulative count of ed by folio. |