(13-04-2026, 09:09 AM)FamagustaTed Wrote: You are not allowed to view links. Register or Login to view.Hi Guillaume,
This Tironian angle is really interesting and something I've not looked into yet, I read your initial post and it's a great story the tech CTO ignoring sleep and sometimes his wife to pursue cracking of the manuscript.
I've been working on something similar that might be related, unfortunately I was quite careless in my presentation and rightfully ended up in the slop bucket
My method looked at the short labels next to the herbal plants and found they appear to function as a selective notation system. Specific morphemes predict visible features of the drawings (branching stems, lobed leaves, complexity level etc.).
Would you be willing to review and test any of my papers workings, the preprint and source tests are on zenodo
You are not allowed to view links. Register or Login to view.
Best regards
Mat
This is genuinely impressive work. I want to say that upfront because I think the rigor here deserves recognition regardless of whether one agrees with every conclusion.
I've been running a parallel computational analysis (different approach, same manuscript, same frustration) and the convergence between our independent findings is striking:
Where we agree, having arrived independently:
- The text has consistent morphological structure (you find a four-layer grammar; I find prefix+root+suffix decomposition covering 67% of pharma words)
- The structure varies systematically by section (your six compositional regimes; my section-specific vocabulary,56% of pharma vocabulary is exclusive)
- It's not cipher, not hoax, not random (we both eliminate these through different tests)
- The conclusion is the same uncomfortable one: we can describe
what the system does far better than
what it says
Your cross-modal testing (text predicting illustration features) is something I never attempted and I think it's one of the most original angles I've seen in VMS research. The idea that label morphemes predict specific plant features across independent visual channels,if it holds up under adversarial scrutiny,would be a genuine breakthrough.
Where we might learn from each other's mistakes:
I recently had to retract one of my claims (herbal roots as substrings in pharma compounds, originally p=0.002). A forum member's question made me run a proper null test controlling for block-initial character distribution, and the signal vanished completely (p=0.944). It was a gallows bias artifact,I was comparing block-initial words against random words from all positions, which is not a valid null.
This experience made me hypersensitive to the question:
what is the right null model? And I wonder if some of your results might benefit from the same adversarial treatment. For instance:
- The rho=0.600 between discourse-framing density and visual complexity,does it survive when controlling for text length? Longer text and more complex illustrations could both simply correlate with available space on the folio.
- The 91-97% grammar classification,what does it score on a shuffled text with the same character frequencies? A grammar with enough layers can capture statistical regularities that aren't morphological.
- The HMM convergence (NMI=0.181, entity purity 0.53),you present this honestly, but those numbers are weak. I'd be curious whether you've tested what NMI a random partition achieves.
These aren't gotchas,I'm asking because I wish someone had asked me these questions before I published my retracted claim. Getting the null model right is the hardest part of VMS research, because the manuscript's internal statistics are deceptively regular.
Your "restricted technical notation" and my "personal shorthand pharmacopoeia"
I think we may be describing the same elephant from different angles. You formalize the grammar and test it against illustrations. I match root distributions against medieval pharmaceutical corpora. You find the system is "structurally technical but lexically local." I find it's consistent with Tironian-style mnemonic abbreviation. Neither of us can read it.
The difference is that your work doesn't commit to a content domain (pharmacy, botany, etc.) while mine does,which makes yours more general but also harder to falsify. My pharmaceutical hypothesis is immediately testable (more botanical identifications either strengthen or break the fingerprint method). Your model is elegant but,as you acknowledge,doesn't yield a reading.
I'd genuinely like to exchange notes if you're open to it. Specifically, your visual plant feature annotations could be valuable for testing my fingerprint method against something other than contested botanical identifications. And my corpus infrastructure (8 medieval pharmaceutical corpora, tokenized and annotated) might be useful for testing whether your compositional regimes correlate with recipe structure.
Two independent analyses converging on "structured technical notation that resists reading" is either a coincidence or a signal. I'd rather find out which one together than separately.
My work, including the retracted claim and all failures: You are not allowed to view links.
Register or
Login to view.