The Voynich Ninja

Full Version: Structural patterns in the VMS & Evaluating the Tironian shorthand hypothesis
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
[attachment=15071]

I ran a computational parse of the full ZL v3b text and found something I'd like expert eyes on.

Each herbal folio starts with a unique word (f26v = `pched`, You are not allowed to view links. Register or Login to view. = `pcheod`, You are not allowed to view links. Register or Login to view. = `foch`). These same roots reappear embedded inside pharma compound words:

```
Herbal f26v:  pched              (standalone, the plant's code)
Pharma:        o.pched.y          (prefix + root + suffix)
              pched.al          (root + suffix)
```

53% of pharma openers (154/286) contain a herbal root. Compounds decompose as prefix (logogram) + root (content) + suffix (grammar). This is ID-agnostic: the pattern holds regardless of which plant is which.

I'm aware of King-Andrisani, Lunazzi's brachigraphy, and Stolfi's PAAFU. This seems to go beyond paragraph-initial position since it links two sections structurally, but I may be missing something obvious.

Has this herbal-pharma substring link been discussed before? Does it hold up?
(12-04-2026, 09:13 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Every herbal folio has a unique first word that acts as that plant's code.


f2v.1       kooiin
f29v.1     kooiin

f19r.1      pchor
f21r.1      pchor
f52v.1      pchor

f54r.1      podaiin
f55r.1      podaiin


(12-04-2026, 09:13 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.This happens systematically. **53% of pharma recipe openers** (154 out of 286) contain a herbal plant root as a substring.

What is a pharma recipe opener? A line? If labels are not counted there are 231 lines in the "pharma" section.
(12-04-2026, 10:52 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.This happens systematically. **53% of pharma recipe openers** (154 out of 286) contain a herbal plant root as a substring.

What is a pharma recipe opener? A line? If labels are not counted there are 231 lines in the "pharma" section.

Good catch, let me be precise.

On "unique": 132 out of 143 herbal folios (92%) have a distinct first word. Five words are shared by 2-3 folios:

First word Folios Count
pchor f19r, f21r, You are not allowed to view links. Register or Login to view. 3
tshor f15r, You are not allowed to view links. Register or Login to view. 2
kooiin f29v, You are not allowed to view links. Register or Login to view. 2
cho f42r, You are not allowed to view links. Register or Login to view. 2
podaiin f54r, You are not allowed to view links. Register or Login to view. 2

So 11 folios share a first word with at least one other folio. "Unique" was an overstatement, 92% distinct is the correct number.

The shared cases are interesting in themselves: a pragmatic pharmacist might use the same code for plants he considers interchangeable (f54r and You are not allowed to view links. Register or Login to view. are consecutive folios, possibly variants of the same plant; You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. are recto/verso of the same leaf).

On "pharma recipe opener": I count text blocks, not lines. In the ZL transcription, the pharma section has 286 blocks (starred paragraphs, each typically one recipe) across 1,085 lines. My "53% of 286 openers" means 154 block-initial words contain a herbal root as a substring. If you count 231 lines, that's likely the line count for a subset of the pharma folios, or a different section boundary definition. I used all folios tagged "pharma" in the ZL metadata (f103r through f116v, 24 folios)., f42v
2
One case supports the interchangeability idea: tshor (f15r=Sonchus, f53v=Hieracium) are both Asteraceae, latex-producing composites that a pharmacist would use as substitutes.
Two pairs are on adjacent folios (cho on f42r/v, podaiin on f54r/f55r), possibly variant preparations of related plants.
The other two (pchor, kooiin) show no obvious botanical link, though the plant identifications themselves are contested. Honestly, cho (2 chars) is probably a functional word appearing by coincidence as a folio opener, not a real plant code.

I tested phonetic mappings, co-occurrence, positional alignment, EM optimization, constraint propagation, all failed.

This one held.
(12-04-2026, 10:52 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(12-04-2026, 09:13 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Every herbal folio has a unique first word that acts as that plant's code.
f2v.1       kooiin
f29v.1     kooiin

f19r.1      pchor
f21r.1      pchor
f52v.1      pchor

f54r.1      podaiin
f55r.1      podaiin

This is precisely the pattern that one expects if each parag of the Herbal section is about a different plant, and starts with the plant's name.  Many names will be just one word long, and generally will not be mentioned anywhere else in that herbal.  Or in an treatise of anatomy.  or in astrological charts.

Some plant names will be two words or more, and then two such names may begin with the same word.

From Culpeper's herbal:

  ...
  FOXGLOVE
  FUMITORY
  GARDEN RUE
  GARDEN TANSIE
  GARDEN VALERIAN
  GARLICK
  GERMANDER
  GOLDEN MAIDENHAIR
  GOLDEN ROD
  GROMEL
  GROUNDSEL
  HAWKWEED
  ...

All the best, --stolfi
(12-04-2026, 11:09 PM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.If you count 231 lines, that's likely the line count for a subset of the pharma folios, or a different section boundary definition. I used all folios tagged "pharma" in the ZL metadata (f103r through f116v, 24 folios).

The pharmaceutical section is tagged $I=P. Folios 103-116 are starred paragraphs a.k.a. recipes tagged $I=S in IVTFF.
(13-04-2026, 12:15 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.The pharmaceutical section is tagged $I=P. Folios 103-116 are starred paragraphs a.k.a. recipes tagged $I=S in IVTFF.

Thank you for the correction, I was using section labels from my own parsed JSON, not the IVTFF tags directly. I'll align my terminology. The 286 blocks I count are the starred paragraphs ($I=S) across f103r-f116v.


(12-04-2026, 11:28 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.This is precisely the pattern that one expects if each parag of the Herbal section is about a different plant, and starts with the plant's name.  Many names will be just one word long, and generally will not be mentioned anywhere else in that herbal.  Or in an treatise of anatomy.  or in astrological charts.

Some plant names will be two words or more, and then two such names may begin with the same word.

Good challenge Stolfi. I ran a null test: 500 random sets of VMS words with the same length distribution as the herbal roots, checked substring match rate against pharma openers.


Real herbal roots: 55%. Random words of same lengths: 25% mean. p = 0.004.
Hi Guillaume,

This Tironian angle is really interesting and something I've not looked into yet, I read your initial post and it's a great story the tech CTO ignoring sleep and sometimes his wife to pursue cracking of the manuscript.

I've been working on something similar that might be related, unfortunately I was quite careless in my presentation and rightfully ended up in the slop bucket  Big Grin My method looked at the short labels next to the herbal plants and found they appear to function as a selective notation system. Specific morphemes predict visible features of the drawings (branching stems, lobed leaves, complexity level etc.).

Would you be willing to review and test any of my papers workings, the preprint and source tests are on zenodo
You are not allowed to view links. Register or Login to view.

Best regards
Mat
(13-04-2026, 07:13 AM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Real herbal roots: 55%. Random words of same lengths: 25% mean. p = 0.004.

This is not a valid method: the null hypothesis is not that the first words of paragraphs are random. It is well known that the first lines of paragraphs are special, and the first words of lines are special, see PAAFU, LAAFU studies. By "special" I mean statistically different, the first character mostly but not only, also much frequent gallows in the words of the first line and especially on the first word: they are certainly not "random" words.

You are again trusting your overconfident AI to do the work for you: you should know better and stop wasting everyone's time with unsupported theories. This is why AI theories are banned: people who are too lazy to create their own theories are also too lazy to check them by themselves, they blindly trust the AI to come up with theory, code, and results and just copy-paste the mass of hallucinated or badly supported claims verbatim to the forum. I noticed that you did exactly that in this thread, then deleted most of them (good) but it turns out that even the most reasonable-looking claim is deeply flawed.
(13-04-2026, 09:38 AM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(13-04-2026, 07:13 AM)CorwinFr Wrote: You are not allowed to view links. Register or Login to view.Real herbal roots: 55%. Random words of same lengths: 25% mean. p = 0.004.



This is not a valid method: the null hypothesis is not that the first words of paragraphs are random. It is well known that the first lines of paragraphs are special, and the first words of lines are special, see PAAFU, LAAFU studies. By "special" I mean statistically different, the first character mostly but not only, also much frequent gallows in the words of the first line and especially on the first word: they are certainly not "random" words.


You're right, and that's exactly the flaw. I should have known about PAAFU/LAAFU before building a null test on the assumption that paragraph-initial words are exchangeable with random words. They're not, the gallows enrichment in first position is well documented.

The corrected test confirms what you'd expect: when controlling for block-initial character distribution, the herbal-pharma substring signal vanishes completely (p = 0.944). Retracted on GitHub with full analysis.
Pages: 1 2 3