![]() |
|
The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: The oddities of the bigram "ed" pt. 3 : It's not just "ed" (/thread-5384.html) Pages:
1
2
|
The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026 Here are links to my previous posts in this series. You are not allowed to view links. Register or Login to view. You are not allowed to view links. Register or Login to view. In those you'll find what leads up to this post. "ho"mogonized In my first post, I lead off with the striking chart that everyone seemed to like. I'll start this post the same way. What you're looking at there is the bigram "ho" compared to "ed" normalized with my 0ed pages on the left, ed+ pages on the right. Ok, that chart is a bit jagged. Let me bring that into focus. That is the count of ho and ed normalized by 1000 tokens. The vertical bar represents where my ed0 ends and ed+ begins. These two charts are essentially showing that the bigram ho and the bigram ed are swapping midfix dominance. All of those pages in the ed+ where they do swap is where ed has it's lowest counts. So even pages where ed occurs only a few times, ho is still the dominant midfix. In this chart, I've sorted by ho/ed count where ho's higest count is on the left, ed's highest count is on the right. Their counts are normalized by page word count. I've shaded the background to show my ed0 in grey, ed+ in blue. Now this chart is striking. And it begs the question, "Is Currier A and B languages and my ed0 and ed+ correct???" Here's ho compared to ok, which is also a midfix token and shares a similar word count. What's different about ho from other bigrams introduced in 0ed, it's count doesn't increase with the folio word count. It's density per folio changes. 0ED: 0.264 ED+: 0.056 Here's ed's density for comparison 0ED: 0.00068 ED+: 0.182 ho isn't the only one that changes but it's by far the largest and that 0ed density is exactly what makes it seem completely opposed to ed. There are other changes in bigrams but not nearly this drastic. In this chart, blue is the 0ed pages, orange the ed pages. What I did for this chart is chop off the unigram prefix and suffix of every word and look at what was left in the middle. I then took that top 100 cores by count and looked at letter count. H and O dominate the 0ed side. And they're very much present in the ed+ side. But e and d see a large change in count. Also notice that the counts of h and e are almost perfectly interchangeable between 0ed and ed+ Now, looking at the top 100 bigrams across both 0ed and ed+ you can easily see where ho got demoted, ed got promoted, along with a lot of bigrams. None of those other bigrams work like ho. None have a doppelganger in the 0ed side. (that I've found) What Currier Saw & Theory So, I hope this makes you ask a few questions. I can tell you that I have lots more charts that try to prove why this happened. I've looked into novelty, bigram usage, bigram depletion and just a gob more aspects of these two regimes and I can't come up with one solid answer. All of the tests that I've performed says that nothing stood out. Well, except for one thing. Curriers eyes. This is going to delve into theory. And I'm not saying I'm right, I'm saying this is what it suggests to me. Please don't be too brutal, my theories change on a daily basis. I think I've already shown that Currier spotted these differences some 50 years ago and that my ed0/ed+ pretty much aligns with his language A and language B. He had the time, desire and I believe some punch cards, to make spotting this difference easier. I think I've also shown that this is not section specific. There are plenty of Herbal pages with the bigram ed in prominence. So, the question is, why has nobody until Currier spotted this? And why has nobody been able to show this distinction to this degree until now? Compare these two charts. The first one is split on my ed0 and ed+ The second, is in original folio order. Had the Voynich been in my 0ed/ed+ order, I "believe" that spotting this regime shift would have happened much earlier. I "believe" it would have been blatantly obvious. 100+ pages you have gobs of ho in the middle of the word and then, it's mostly gone and ed is in the middle of lots of words. In my first post, I demonstrated how ed pages are on specific sheets (with one low count exception) in the herbal section. If you look at the 2 charts above, had they been in 0ed/ed+ order there would have been over 100 pages (2/5) of the book with no ed and then, it would suddenly appear as a midfix and become dominant. And I demonstrated how these sheets are wrapped around sheets with 0ed (firsts post). By interleaving those sheets, that 100 page gap was cut roughly in half (f26r is where ed begins in folio order). Now, I'm going to have to freely admit, I did not have the mathematical skills to come up with this. I knew what I was looking for but the math was a bit beyond me. I asked chatGPT to come up with a formula that would allow me to detect visual differences in pages by their text composition. And here's what it came up with and what I plugged into a python script. So, if you have the mathematical skills, PLEASE confirm this is a valid method. I've looked it over, it makes sense to me but I'm not a mathematician. GPT: For each page, we compute five very simple surface features:
Then we build a vector like this for each page: [1, HO_per_word, ED_per_word, Gallows_per_word, Mean_token_length, Top5_bigram_share] That leading 1 is just the intercept term. Then we fit a simple linear regression: score = w₀ + w₁·HO + w₂·ED + w₃·Gallows + w₄·Length + w₅·Top5 The weights (w’s) are solved with ordinary least squares to best separate:
So, it is claiming that by just measuring those surface textures with that formula, that it can predict whether a page is 0ed or ed+ with a 89.8% accuracy rate. Now, here's what the same chart looks like sorted in folio order. So, here's my theory. The reason this regime shift wasn't detected until Currier was that it was intentionally obscured by shuffling pages around and, by skipping over to other sections (pharma and zodiac) and then, coming back to finish the herbal section. I believe, that if they hadn't shuffled those pages, and if this were written in the 15th century, then anyone with linguistic skills, like an adept scribe or cryptographer, would have spotted it back then. Now, this shuffling of pages in no way excludes an intentional production process and that the folios are mostly in chronological order or that it was the result of it being rebound once or twice. Right now, my Occam's Razor detector is saying obfuscation. Conclusion This concludes my 3 part series on the oddities of "ed". I'm hoping I've given everyone a lot to think about. I have one more series planned and I hope it's going to be no less... erm... informative. It's going to turn the focus back on repair with an attempt to reverse engineer the Voynich. Thanks for all the great replies and good hunting. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Jorge_Stolfi - 19-02-2026 (19-02-2026, 03:41 AM)Dunsel Wrote: You are not allowed to view links. Register or Login to view.That is the count of ho and ed normalized by 1000 tokens. The vertical bar represents where my ed0 ends and ed+ begins. These two charts are essentially showing that the bigram ho and the bigram ed are swapping midfix dominance. All of those pages in the ed+ where they do swap is where ed has it's lowest counts. So even pages where ed occurs only a few times, ho is still the dominant midfix. Thanks for the work! Numbers are always good. However, let me repeat this advice: statistics of characters (bigrams, trigrams, etc.) can be only puzzling, not illuminating. You can plot them in a thousand different ways, and will not conclude anything besides "something is different". It would be like studying ecology by counting and tabulating animals according to color, or length of tail, or number of black spots -- lumping elephants with squirrels, macaws with lizards, polar bears with caterpillars. At most you would be able to tell that the Sahara and Antarctica are different from the Amazon, but will never really understand how or why. If you want to understand the "language A / language B" riddle, you must look at words and try to understand how they map (or not) between the two languages. Are there words that have similar frequencies in both languages? Are there patterns involving those words in one language that seem to match patterns in the other language? And so on. For instance, I suspect that there is a correspondence between the words of the two languages, such that corresponding words have the same number of gallows, and same number of "dealers" (the characters d l r s), and the same number of dealers on each side of the gallows -- even if the gallows may be different, and some dealers may be replaced by other dealers, possibly non-deterministiclaly, and the other Voynichese glyphs (q, a, o, y, Ch, Sh, ee, and their e modifiers) may be added, deleted, or replaced more freely. The variation of frequencies of ed, ho, and other digraphs would be just shadows of this hypothetical word-level mapping. For instance, according to my seven-slot ("crust-mantle-core") word model, the digraph ed can occur only when one of the "e-bench" elements E = { {Che} {She} {ee} {eee} } or one of the "e-gallows" F = { {ke} {te} {CKhe} {CThe} } is followed by a {d} element. That already "explains" why ed is never word-initial. That model also predicts that, in a worth that has a gallows, ed cannot appear before the gallows. So, the fact that ed does not occur in language A could be because, for every word wB in language B that has an E or F element followed by a {d} element, either the word wB has no correspondent in language A, or its "translation" wA in language A:
All the best, --stolfi RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - ReneZ - 19-02-2026 (19-02-2026, 11:53 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.For instance, according to my seven-slot ("crust-mantle-core") word model, the digraph ed can occur only when one of the "e-bench" elements E = { {Che} {She} {ee} {eee} } or one of the "e-gallows" F = { {ke} {te} {CKhe} {CThe} } is followed by a {d} element. That already "explains" why ed is never word-initial. That model also predicts that, in a worth that has a gallows, ed cannot appear before the gallows. While I don't disagree, I think that it works in the opposite direction. The observation that ed (in fact e) does not appear at the start of words suggests that e is something attached to the preceding character. Clear example: in chol the ch can be detached and ol is also frequent. However, in chedy the ch cannot be detached. che can. But we notice this from bigram (and trigram) statistics so these are useful. Word statistics are also useful, but we do not know that Voynich word represent plain text words (even though this is something that people tend to assume), and word statistics are much more affected by spelling and spacing uncertainties. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026 I'm not sure I understand the significance of sorting. Imagine, that ho and ed are somewhat mutually exclusive (for example, they are a spelling variation, one scribe writes "color" the other "colour", but sometimes they are mindlessly coping from each other's drafts and use the opposite spelling). Imagine the book speaks about colors a lot. This would lead to overabundance of trigram "our" on some pages and "lor" on other pages. You can sort them by "our"/"lor" and of course you will get a smooth transition. This doesn't mean anything for the order of writing. Whenever you sort many random numbers taken from a range, you will get a smooth curve. The below is a graph produced by creating 200 tuples (A, B) where A+B = 100, but otherwise the numbers are random, and then sorting by A/B. Overall, I agree with Jorge_Stolfi that "statistics of characters (bigrams, trigrams, etc.) can be only puzzling, not illuminating." Without some underlying model explaining the data, the data is curious but mostly doesn't lead anywhere. I remember seeing a past study or post (something with charts) about which character combinations are antagonistic (like ho and ed, statistically mutually exclusive in the MS) and I remember there were more pairs like this. Does anyone remember what post/article that could be? RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Koen G - 19-02-2026 So shedy eats shol? Regarding the shuffling of pages, this is understood to have happened afterwards, possibly by someone who did not know everything about the MS. Accidentally bound out of order seems more likely than intentionally shuffled. Edit to add: I cannot understand why the study of glyphs is being discouraged. The manuscript is made of glyphs. Study them and you may understand it better. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026 (19-02-2026, 12:57 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.While I don't disagree, I think that it works in the opposite direction. The observation that ed (in fact e) does not appear at the start of words suggests that e is something attached to the preceding character. But this is already a model of sorts, that assumes that character combinations are blocks that can be "attached" and "detached". We can consider another model that works by transforming characters, and then the absence of ed would mean that e has a different word-starting form. For example, edy on its own becomes chol by splitting y to ol and merging ed to ch. In the context of these different models the statistics can be very illuminating, but without them the statistics are just pretty charts. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - ReneZ - 19-02-2026 (19-02-2026, 01:04 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Overall, I agree with Jorge_Stolfi that "statistics of characters (bigrams, trigrams, etc.) can be only puzzling, not illuminating." When solving a cipher (and by extension any textual mystery) you need to attack the odd things that stand out. There is the weakness. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - oshfdk - 19-02-2026 (19-02-2026, 01:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.When solving a cipher (and by extension any textual mystery) you need to attack the odd things that stand out. There is the weakness. I don't think I can name many things about the Voynich MS that don't stand out. Who knows, maybe it's not an elephant, but a porcupine, focusing on the longest quills may not get us closer to the truth. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026 (19-02-2026, 01:04 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I'm not sure I understand the significance of sorting. Imagine, that ho and ed are somewhat mutually exclusive (for example, they are a spelling variation, one scribe writes "color" the other "colour", but sometimes they are mindlessly coping from each other's drafts and use the opposite spelling). Imagine the book speaks about colors a lot. This would lead to overabundance of trigram "our" on some pages and "lor" on other pages. You can sort them by "our"/"lor" and of course you will get a smooth transition. This doesn't mean anything for the order of writing. Whenever you sort many random numbers taken from a range, you will get a smooth curve. The below is a graph produced by creating 200 tuples (A, B) where A+B = 100, but otherwise the numbers are random, and then sorting by A/B. You are correct. Two random tuples would produce exactly that. But you forgot the boolean. That chart is not measuring 2 things, it's measuring 3. So, taking your same tuple math with a random boolean attached, I get this. That is not what my chart shows. RE: The oddities of the bigram "ed" pt. 3 : It's not just "ed" - Dunsel - 19-02-2026 (19-02-2026, 11:53 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Thanks for the work! Numbers are always good. You are correct. Words are like the concrete that creates the structure of the language. But characters are like the sand that hold that concrete together. There was this old saying about how good a salesman was, "He could sell sand to an Arab." As it turns out, Europeans are selling sand to Arabs. Because the Arabian sand is blown across dunes it becomes rounded and smooth. Whereas European sand is blocky, jagged, rough. European sand is better for building concrete because it has more surface texture. Concrete with Arabian sand is more prone to collapse. So, if you don't look at those little grains of sand, you'll never know if your structure will collapse. Thanks again! |