Here are links to my previous posts in this series.
You are not allowed to view links.
Register or
Login to view.
You are not allowed to view links.
Register or
Login to view.
In those you'll find what leads up to this post.
"ho"mogonized
In my first post, I lead off with the striking chart that everyone seemed to like. I'll start this post the same way.
What you're looking at there is the bigram "ho" compared to "ed" normalized with my 0ed pages on the left, ed+ pages on the right.
Ok, that chart is a bit jagged. Let me bring that into focus.
That is the count of ho and ed normalized by 1000 tokens. The vertical bar represents where my ed0 ends and ed+ begins. These two charts are essentially showing that the bigram ho and the bigram ed are swapping midfix dominance. All of those pages in the ed+ where they do swap is where ed has it's lowest counts. So even pages where ed occurs only a few times, ho is still the dominant midfix.
In this chart, I've sorted by ho/ed count where ho's higest count is on the left, ed's highest count is on the right. Their counts are normalized by page word count. I've shaded the background to show my ed0 in grey, ed+ in blue. Now this chart is striking. And it begs the question, "Is Currier A and B languages and my ed0 and ed+ correct???"
Here's ho compared to ok, which is also a midfix token and shares a similar word count. What's different about ho from other bigrams introduced in 0ed, it's count doesn't increase with the folio word count.
It's density per folio changes.
0ED: 0.264
ED+: 0.056
Here's ed's density for comparison
0ED: 0.00068
ED+: 0.182
ho isn't the only one that changes but it's by far the largest and that 0ed density is exactly what makes it seem completely opposed to ed. There are other changes in bigrams but not nearly this drastic.
In this chart, blue is the 0ed pages, orange the ed pages. What I did for this chart is chop off the unigram prefix and suffix of every word and look at what was left in the middle. I then took that top 100 cores by count and looked at letter count. H and O dominate the 0ed side. And they're very much present in the ed+ side. But e and d see a large change in count. Also notice that the counts of h and e are almost perfectly interchangeable between 0ed and ed+
Now, looking at the top 100 bigrams across both 0ed and ed+ you can easily see where ho got demoted, ed got promoted, along with a lot of bigrams. None of those other bigrams work like ho. None have a doppelganger in the 0ed side. (that I've found)
What Currier Saw & Theory
So, I hope this makes you ask a few questions. I can tell you that I have lots more charts that
try to prove why this happened. I've looked into novelty, bigram usage, bigram depletion and just a gob more aspects of these two regimes and I can't come up with one solid answer. All of the tests that I've performed says that nothing stood out. Well, except for one thing. Curriers eyes.
This is going to delve into theory. And I'm not saying I'm right, I'm saying this is what it suggests to me. Please don't be too brutal, my theories change on a daily basis.
I think I've already shown that Currier spotted these differences some 50 years ago and that my ed0/ed+ pretty much aligns with his language A and language B. He had the time, desire and I believe some punch cards, to make spotting this difference easier. I think I've also shown that this is not section specific. There are plenty of Herbal pages with the bigram ed in prominence. So, the question is, why has nobody until Currier spotted this? And why has nobody been able to show this distinction to this degree until now?
Compare these two charts. The first one is split on my ed0 and ed+ The second, is in original folio order.
Had the Voynich been in my 0ed/ed+ order, I "believe" that spotting this regime shift would have happened much earlier. I "believe" it would have been blatantly obvious. 100+ pages you have gobs of ho in the middle of the word and then, it's mostly gone and ed is in the middle of lots of words.
In my first post, I demonstrated how ed pages are on specific sheets (with one low count exception) in the herbal section. If you look at the 2 charts above, had they been in 0ed/ed+ order there would have been over 100 pages (2/5) of the book with no ed and then, it would suddenly appear as a midfix and become dominant. And I demonstrated how these sheets are wrapped around sheets with 0ed (firsts post). By interleaving those sheets, that 100 page gap was cut roughly in half (f26r is where ed begins in folio order).
Now, I'm going to have to freely admit, I did not have the mathematical skills to come up with this. I knew what I was looking for but the math was a bit beyond me. I asked chatGPT to come up with a formula that would allow me to detect visual differences in pages by their text composition. And here's what it came up with and what I plugged into a python script.
So, if you have the mathematical skills, PLEASE confirm this is a valid method. I've looked it over, it makes sense to me but I'm not a mathematician.
GPT:
For each page, we compute five very simple surface features:
- HO density per word
- ED density per word
- Gallows density per word
- Mean token length
- Top-5 bigram concentration (how dominant the most common bigrams are)
Then we build a vector like this for each page:
[1, HO_per_word, ED_per_word, Gallows_per_word, Mean_token_length, Top5_bigram_share]
That leading 1 is just the intercept term.
Then we fit a simple linear regression:
score = w₀ + w₁·HO + w₂·ED + w₃·Gallows + w₄·Length + w₅·Top5
The weights (w’s) are solved with ordinary least squares to best separate:
- 0ED pages → target = 0
- ED+ pages → target = 1
And here's the chart sorted in ed0 - ed+ order
So, it is claiming that by just measuring those surface textures with that formula, that it can predict whether a page is 0ed or ed+ with a 89.8% accuracy rate.
Now, here's what the same chart looks like sorted in folio order.
So, here's my theory. The reason this regime shift wasn't detected until Currier was that it was intentionally obscured by shuffling pages around and, by skipping over to other sections (pharma and zodiac) and then, coming back to finish the herbal section. I believe, that if they hadn't shuffled those pages, and if this were written in the 15th century, then anyone with linguistic skills, like an adept scribe or cryptographer, would have spotted it back then.
Now, this shuffling of pages in no way excludes an intentional production process and that the folios are mostly in chronological order or that it was the result of it being rebound once or twice. Right now, my Occam's Razor detector is saying obfuscation.
Conclusion
This concludes my 3 part series on the oddities of "ed". I'm hoping I've given everyone a lot to think about. I have one more series planned and I hope it's going to be no less... erm... informative. It's going to turn the focus back on repair with an attempt to reverse engineer the Voynich.
Thanks for all the great replies and good hunting.