Dear friends,
Most cipher theories about Voynichese share a quiet assumption: if it encodes a natural language, the mapping must be one-to-many. One plaintext letter fans out into several ciphertext glyphs. The bloat has to go somewhere, and the low entropy demands it. I want to suggest the opposite direction is equally natural, and rather more interesting. What if the script is many-to-one? Not at the letter level, but at the level of grammatical function. What if each glyph encodes not a sound but an
operator, and a single Voynich "word" is not a word at all but a compressed clause template? This would explain quite a lot. The low entropy. The rigid positional constraints. The fact that words look repetitive without being random. And it would explain why nobody has found a plaintext: there isn't one, not in the way we've been looking.
The method
Take each EVA glyph. Look where it appears in words (initial, medial, final). Look at what it co-occurs with and what it avoids. Look at what follows it across word boundaries. Then ask: what class of grammatical operator would produce
exactly this distribution? For most glyphs, only one candidate survives elimination. Not because the answer is obvious, but because the constraints are so tight that alternatives fail specific statistical tests. I'll show some of what falls out if you take this idea seriously, but first, a disclaimer:
these semantic labels are almost certainly wrong. What matters is that they're wrong in a specific and falsifiable way. The structural behaviours they describe are real. Whether "demonstrative" or "topic marker" or "record initialiser" is the right name for what
q does is where the interesting argument begins. Here's what the distributional evidence suggests:
EVA ----
Proposed role ----
Why?
q → demonstrative ("this") → 99% word initial, selects
o at 97%, never repeats, nearly absent from labels
o → generic head ("thing") → Most frequent glyph, precedes determiners,
more common in labels than paragraphs
k /
t → definite/indefinite → Rarely co-occur (<2%), when both present
t precedes
k 3:1, section-dependent ratios
f /
p → relative definite/indefinite → Same
k /
t contrast, but license
ch /
sh at 3-4x the rate, i.e. open dependent clauses
ch /
sh → clause openers (subordinate/main) →
sh more initial (71% vs 57%),
sh words shorter,
sh elevates determination in next word
e → process modifier ("how") → Strictly medial, clusters after openers, precedes verbal elements
ee → entity modifier ("what kind") → Strictly medial, clusters after determiners, shifts toward nominal position
eee (gradient continues) → Rare, further nominal shift, increasingly closes with
s
a → case linker → Never final, followed by role marker 96% of the time
r → nominative (subject) → Precedes
n 97.5% when both present, doesn't enrich
k
n → oblique (object/goal) → Enriches
k (
k :
t = 2.06), tightly
a-bound (96%)
m → completive (result/into) → 95% word-final, less tightly
a-bound than
n (84%)
d → verbal element ("does") → Late-position, closes to
y at 57%, correlates with clause openers
y → verbal closure / resumptive → 93% final; resets to
q (27%) across word boundary; can also close-and-reopen within a word
l → nominal boundary ("and then") → 59% final; continues to
ch /
sh (37%), hands off rather than resets
s → backward anchor ("aforementioned") → Enriched 2x at line-initial, suppresses openers, avoids
q
i /
ii /
iii → specificity gradient → Longer chains correlate with less determination (lower
k :
t ratio)
The decision chains behind each assignment are quite lengthy and dependent on each other, a problem I tried to describe You are not allowed to view links.
Register or
Login to view.. Perhaps I should write a separate post about that later?
Reading some words
Once you have operators, word become readable as
structural templates. Here are some of the most frequent in the manuscript.
Whatever system produces these patterns, it appears stable across sections, text types, and scribal hands. The frequencies vary but the rules don't.
daiin (834 occurrences) → d.a.ii.n → VERB.CASE.OBLIQUE(mid-specificity).
"Acts upon [oblique referent]." The single most common word is a verbal action directed at something, with the
ii marking a middle level of specificity. This is the workhorse.
chedy (516) → ch.e.d.y → OPENER.PROCESS-MOD.VERB-CLOSE.
A complete minimal clause: opened, modified as to manner, predicated, closed. Done.
shedy (430) → sh.e.d.y → OPENER.PROCESS-MOD.VERB-CLOSE.
Same structure, different opener. The
ch /
sh distinction is real and stable across scribal hands (Currier A:
ch :
sh = 2.75, Currier B: 2.12). Whatever the difference is, it's grammatical, not stylistic.
qokeedy (305) → DEM.HEAD.DEF.ENTITY-MOD.VERB.CLOSE.
"This definite [specified-kind-of] thing does." A full demonstrative clause in six glyphs. Note the
ee (entity modifier) sitting between the determiner and the verb, qualifying
what kind of thing, not
how it acts.
ol (577) → HEAD.BOUNDARY.
"Thing;" An entity mentioned in passing, with the nominal boundary handing off to what follows. Connective tissue.
dy (244) → VERB.CLOSE.
The absolute minimum predication. "Does." Period.
Fused and decomposed forms
There's curiosity about the gallows letters that's worth pausing on. The compound gallows -
ckh,
cth,
cfh,
cph - look visually like
ch or
sh with the legs of
k,
t,
f, or
p threaded through them. And they behave as if they fuse an opener with a determiner into a single glyph:
Compound ----
Components ----
Initial rate ----
Cross-word effect
ckh →
ch +
k (subordinate + definite) → 20% → Next word strongly
k-enriched
cth →
ch +
t (subordinate + indefinite) → 51% → Next word balanced
k /
t
cfh →
ch +
f (subordinate + relative definite) → 37% → Next word shifts to relative modes (
f /
p elevated)
cph →
ch +
p (subordinate + relative indefinite) → 58% → Next word mixed
The cross-word effects are the telling part. After
cfh, the
following word shows elevated
f and
p (the relative determination modes). The compound gallows doesn't just fuse two operators; it propagates its features forward. Now here's the peculiar thing. These fused forms have decomposed equivalents. You can write
fach instead of
cfh, spelling out the case linker
a between the relative determiner and the opener. Across 38525 tokens, there are exactly eight of such decomposed forms. Half of them are line-initial. And one of them is the manuscript's very first word:
fachys → f.a.ch.y.s → RELATIVE-DEF.CASE.OPENER.CLOSE.BACKWARD-ANCHOR.
"Which-specific [thing], [framed], [closed] — of the aforementioned." The same operators that are fused into
cfhys are here written out longhand, with every joint visible. As if the first word of the manuscript spells out what later becomes shorthand. Whether that's meaningful or coincidental, I genuinely don't know. But it's an interesting place to start, because the backward anchor
s points at something already established, yet
nothing has been established yet. Unless the first word assumes a context outside the manuscript itself: a tradition, a source text, or a body of knowledge the reader already holds. "Concerning what is [already] known..."
The sections speak differently, for structural reasons
This is where things get interesting. Different sections of the manuscript have different vocabularies, and under this reading, the differences make structural sense:
- The herbal section is dominated by
daiin, backward references (
s), and continuations. Lots of "does-to-oblique" actions pointing at previously established referents. Prescriptive, process-oriented language. Which is what you might expect for text accompanying drawings of plants, if the text describes what to do with them.
- The biological section looks different. More entity segments (
ol), more complete clauses (
shedy,
chedy), more demonstratives. More descriptive, less procedural.
- The zodiac section is stranger still. It's dominated by
ot- prefixed forms: HEAD.INDEFINITE patterns. Where herbal has definites and backward references, and biological has demonstratives, Zodiac speaks in
indefinites. It's describing things that haven't been established yet. Things being introduced for the first time.
And here's something rather striking about context-sensitivity across the whole manuscript:
- Labels:
q at 1%, standalone
s at 16%
- Paragraphs:
q at 16%, standalone
s at 7%
- Circular text:
q at 2%, standalone
s at 13%
- Radial text:
q at 5%, standalone
s at 16%
Non-pragraph contexts suppress
q and elevate
s, consistently, across every section. Under this reading: labels don't need to introduce new referents (the drawing already shows you what's being labelled) but need more anchoring to what's already been established. Paragraphs need more introduction and less backward reference. That's not proof, but it's a rather tidy fit.
The
e-gradient is also stable across both Currier languages:
- Currier A:
e 57% →
ee 34% →
eee 9% after
ch /
sh
- Currier B:
e 65% →
ee 27% →
eee 10% after
ch /
sh
What about Naibbe?
A fair objection: how do we know these patterns aren't just artefacts of a slot grammar? The calligraphic rules for building legal-looking words, rather than genuine underlying structure? Naibbe, a cipher specifically designed to produce Voynichese-looking output, does replicate the surface-level constraints.
q goes to initial position,
y goes to final, and the positional slots look right. But Naibbe has no memory across word boundaries. And the Voynich manuscript does: the
y /
l continuation asymmetry, the
ch /
sh determination effect on the
following word, and the section-dependent
k :
t ratios are cross-word behaviours that a slot grammar alone cannot produce. There's also a practical problem with Naibbe-style ciphers. They're extraordinaly verbose: one plaintext letter can require six or seven Voynich glyphs. That would make the source text very short, and labels (some of which are six or seven Voynich glyphs) would be encoding one or two plaintext characters. It's hard to believe someone would design a system that verbose to label a drawing.
A reading on folio f68r3
Theory is pleasant. Application is where things get honest. Folio f68r3 contains a star diagram. Three visual elements have labels:
- A large star:
dcholday, a structure containing two verbal events with a clause boundary between them (note
y appearing mid-word as a resumptive, closing one predication before
d opens another). A complete, closed statement. The big star
is something, fully specified.
- A group of seven stars:
doary, an entity appearing as the
subject of an action, then closing. The seven stars
do something, as agents.
- A curved line connecting the group to the centre:
oalcheol, an open structure. Two entity segments connected by a clause opener, both ending in nominal boundaries. The connecting line
relates things, and the relation remains open, pointing onward.
Now, clusters of seven stars in medieval manuscripts typically represent the Pleiades. What does this reading say? It says the seven stars are labelled as an
agent, something that acts, not as a named object. If this reading has any contact with reality, the cluster isn't being told
what it is. It's being told
what it does. Whether that's useful or merely a beautiful hallucination, I leave as an open question.
What the grammatical reading cannot say
Under these
specific assignments, the system has no dedicated operator for negation, conditionals, disjunction, comparison, temporal sequencing, numerals, or epistemic modality. No "not", no "if", no "or", no "more than", no "before", no "perhaps". What remains is a system that can do exactly three things well: point at entities, assign them roles in relations, and predicate actions upon them, all wrapped in clause frames with modification. It is a language for
describing configurations: this thing, in this role, does this, to that. But, these gaps are an artefact of the labels, not necessarily of the system. Change the assignments and the gaps move. If
ch /
sh is not subordinate/main but affirmative/negative, negation is built in. If
k /
t is not definite/indefinite but permitted/forbidden, you have modality. If the
e-gradient is not adverb→adjective but degree of intensity or certainty, you have epistemic marking. The distributional structure stays the same, only what it "cannot say" changes. Under a grammatical reading, it looks like a language of configuration. Under a different reading, it might be considerably more expressive. What it definitely lacks is large open-class vocabulary. Whatever this system does, it does it with roughly nineteen operators and their combinations.
What cribs become
This reframing has an uncomfortable consequence for decipherment. Under a standard cipher theory, a crib is a known plaintext word in a known position. You look for "Mars" or "Taurus" written in Voynichese. Under a many-to-one structural reading, there are no words to crib. The labels in the Zodiac section aren't
names of zodiac signs, they're
specifications of what those signs do, or what properties they carry, expressed in a structural notation. So the question changes. It's not "where does it say Pleiades?". It's "what kind of system describes stellar objects in terms of agent-process-closure patterns rather than names?". That's a harder question.
A convergence worth noting
You are not allowed to view links.
Register or
Login to view. work on a Chinese source-text alignment independently suggests
daiin functions as something like "to use", a verbal or procedural marker. He arrived at this from a completely different direction: external alignment with pharmacopoeia rather than internal distributional analysis. The fact that two independent methods point toward the same functional role for the manuscript's most common word is, at minimum, intriguing. For those pursuing the Chinese reading: if
d is indeed a process operator, then
daiin may not be
the word "to use" but rather
the structural template for "applies-to-[oblique referent]". That distinction matters. It means you're not looking for a character-by-character cipher into Chinese. You're looking for a structural encoding of Chinese
templates, and the grain of the encoding may be at the morpheme or phrase level, not at the character level.
What's probably wrong here
My initial thought process can be summarized as: "If it quacks like grammar and behaves like grammar, you've built system that uses grammar whether that was intentional or not." But I must confess, the semantic labels are almost certainly wrong. "Demonstrative" and "nominative" and "verbal element" are borrowed from grammar because grammar is the easiest framework to explain these patterns in. But the distributional facts are equally well described by medieval supposition theory (
q as term-introduction operator,
k /
t as modes of personal supposition), by formal notation (
q as record initialiser,
y as a record terminator), or by several ofther frameworks I haven't thought of.
It's also worth noting that no known Indo-European language produces this particular fingerprint: a rigid operator-slot grammar with no agreement, no tense, no conjunction, and a closed set of ~19 functional elements. For those interested in Stolfi's Chinese direction, the closest typological fit I've found is Classical Tibetan, an SOV language with postpositional case markers, clause-final verbs, and agglutinative morphology built from a small set of grammatical particles. That's not a claim about the manuscripts's language, it's an observation about what kind of grammar, if any, would actually produce these patterns.
The structural observations themselves are not in doubt, they survive regardless of what you call the operators. What I'm less sure about is what kind of system would produce exactly this pattern of constraints. But the constraints are specific enough to narrow the search considerably. Whatever the underlying system is, it must have:
- A 2x2 mode grid: four values generated from two binary contrasts, where one axis is domain-sensitive (its ratio shifts by section) and the other controls whether dependent structures open
- Two framing operators with differential forward effects on what follows
- Three distinct role markers with strict ordering constraints between them
- A scalar modifier that shifts from process-oriented to entity-oriented as it lengthens
- A closed operator set of roughly nineteen elements - no large open-class vocabulary
A natural language grammar can produce this, but awkwardly. You'd expect agreement, tense, and conjugation, none of which appear. A logical notation can produce it more naturally. A prescriptive system with recipes, procedures, and astrological specifications can produce this almost by design. The question is which kind of formal system, likely available in early fifteenth-century Europe, would generate exactly this fingerprint.
If you think these assignments are wrong, I'm genuinely interested in
how they're wrong. Not "this can't be grammar", I know it probably isn't, not in the linguistic sense, but "this operator would work better as X because of Y". The distributional evidence constrains the answer more tightly than you might expect. And if you think the whole many-to-one idea is wrong, I'd like to hear what other framework can explain the low entropy, the rigid positional constraints, the fact that words look repetitive without being random, and why nobody has found a plaintext.
For anyone who wants to poke holes in this theory: feel free, there will be holes.
Cheers.