06-02-2026, 01:34 AM
Dear friends,
I keep coming back to a question that won't quite leave me alone: how should we actually describe the structural behaviours we see in Voynichese "words"?
They're clearly not simple substitution outputs, and they're not random strings. That's been shown many times. But they don't behave like ordinary words in known languages either. They're constrained, but not in the ways we'd expect.
Here are some observations. I'll present the raw distributional facts first, then show why describing them is harder than it looks.
1. Rigid initial binding: The token q appears word-initially 99% of the time. It essentially never occurs mid-word or finally. When it appears, it selects o as its immediate follower at 97%. Whatever q is doing, it does it once, at the start, and it binds tightly to what follows.
2. Two boundary types with different continuations: Both y and l tend toward final positions, but they create different continuation environments. After y-final words, q appears at 27% and ch/sh appear at 17%. After l-final words, q appears at only 9% while ch/sh rise to 33%. So we have two "closers" that close in different ways. One favours continuation with q (which binds to o). The other favours continuation with ch and sh (which themselves open bounded structures).
3. Paradigmatic exclusion with ordering: The tokens k and t rarely appear together (under 2% co-occurrence). When they do co-occur, t precedes k at roughly 2:1. This looks like a paradigmatic contrast: two tokens competing for the same structural slot, with one tending to come first when both are present.
4. A suffix-like element changes selection rates: Compare k with f, or t with p. The f and p variants are followed by ch at 37–46%, compared with only 10–17% for k and t. Whatever f and p are, they license continuation with ch at three to four times the rate.
5. Reduplication creates gradients: The token e can appear singly, doubled, tripled, or quadrupled. As the chain lengthens, two things shift systematically. What precedes the chain changes: e follows ch/sh 63% of the time, ee only 28%, and eee just 9%. And what follows the chain changes: e is followed by s only 1% of the time, ee reaches 4%, and eee reaches 13%. This isn't free repetition. Longer chains shift from appearing after ch/sh to appearing after k/t, and increasingly close with s.
6. Structures can nest: Forms like ofchedy appear to embed one bounded structure inside another: the f selects ch, which opens a structure that closes with y, all sitting inside a larger unit beginning with o.
This is where it gets tricky: Each of these observations can be described in multiple ways. Take the first one. The observation is: q is 99% initial and selects o at 97%. A grammarian might call this "demonstrative binding to head." A mnemonist might call it "locus anchoring." A notationalist might call it "record initialisation." A medieval logician might call it "term introduction." Or take the nesting behaviour. The observation is: f selects ch...y inside o... A grammarian sees "relative clause inside noun phrase." A mnemonist sees "room within room." A notationalist sees "sub-record inside record." A logician sees "supposition under supposition." The problem is not that one of these is right and the others wrong. The problem is that the distributional evidence cannot distinguish between them. The constraints are real; the interpretation is not forced. Worse: interpretive choices compound. If k is a determiner, then o must be a head. If o is a head, then qo is a noun phrase. If qok is a determined noun phrase, then f must be a relative determiner. Each step feels plausible, but the whole chain rests on the first assumption. Choose a different starting point and you get a different system.
The medieval logicians had a phrase for this kind of contextual constraint: "Talia sunt subiecta qualia permiserint praedicata" ("The subjects are such as the predicates permit"). What can appear is governed by what surrounds it. That describes Voynichese rather well. But it doesn't tell us whether we're looking at grammar, or memory architecture, or something else entirely. There's also the curious fact that Voynichese seems to lack things we'd expect from natural language: no clear tense or aspect marking, no subject-predicate agreement, no conjunctions, no obvious truth-conditional structure. It has positional constraints without syntactic roles, paradigmatic contrasts without lexical diversity, optional length markers that seem to tune mode rather than content.
So I find myself left with a fairly basic question: How should we talk about Voynichese words or tokens at all?
Are they words? Clauses? Records? Control sequences? Loci in a memory system? Terms under modes of supposition? Or are these all just isomorphic descriptions of the same underlying structural observations, each wearing the vocabulary of a different discipline? My concern is that many similar observations may already exist across the forum, scattered under different theories (linguistic, cryptographic, logical, mnemonic), each with its own semantic framing. And that makes it hard to see what is actually shared versus what is interpretation.
I'd be very interested in thoughts on:
- how to discuss these regularities without prematurely committing to meaning,
- whether there's a neutral descriptive vocabulary that's actually usable,
- what minimal properties any explanation of Voynichese must account for, regardless of theory,
- and whether the isomorphism between these frameworks is itself telling us something.
Cheers
I keep coming back to a question that won't quite leave me alone: how should we actually describe the structural behaviours we see in Voynichese "words"?
They're clearly not simple substitution outputs, and they're not random strings. That's been shown many times. But they don't behave like ordinary words in known languages either. They're constrained, but not in the ways we'd expect.
Here are some observations. I'll present the raw distributional facts first, then show why describing them is harder than it looks.
1. Rigid initial binding: The token q appears word-initially 99% of the time. It essentially never occurs mid-word or finally. When it appears, it selects o as its immediate follower at 97%. Whatever q is doing, it does it once, at the start, and it binds tightly to what follows.
2. Two boundary types with different continuations: Both y and l tend toward final positions, but they create different continuation environments. After y-final words, q appears at 27% and ch/sh appear at 17%. After l-final words, q appears at only 9% while ch/sh rise to 33%. So we have two "closers" that close in different ways. One favours continuation with q (which binds to o). The other favours continuation with ch and sh (which themselves open bounded structures).
3. Paradigmatic exclusion with ordering: The tokens k and t rarely appear together (under 2% co-occurrence). When they do co-occur, t precedes k at roughly 2:1. This looks like a paradigmatic contrast: two tokens competing for the same structural slot, with one tending to come first when both are present.
4. A suffix-like element changes selection rates: Compare k with f, or t with p. The f and p variants are followed by ch at 37–46%, compared with only 10–17% for k and t. Whatever f and p are, they license continuation with ch at three to four times the rate.
5. Reduplication creates gradients: The token e can appear singly, doubled, tripled, or quadrupled. As the chain lengthens, two things shift systematically. What precedes the chain changes: e follows ch/sh 63% of the time, ee only 28%, and eee just 9%. And what follows the chain changes: e is followed by s only 1% of the time, ee reaches 4%, and eee reaches 13%. This isn't free repetition. Longer chains shift from appearing after ch/sh to appearing after k/t, and increasingly close with s.
6. Structures can nest: Forms like ofchedy appear to embed one bounded structure inside another: the f selects ch, which opens a structure that closes with y, all sitting inside a larger unit beginning with o.
This is where it gets tricky: Each of these observations can be described in multiple ways. Take the first one. The observation is: q is 99% initial and selects o at 97%. A grammarian might call this "demonstrative binding to head." A mnemonist might call it "locus anchoring." A notationalist might call it "record initialisation." A medieval logician might call it "term introduction." Or take the nesting behaviour. The observation is: f selects ch...y inside o... A grammarian sees "relative clause inside noun phrase." A mnemonist sees "room within room." A notationalist sees "sub-record inside record." A logician sees "supposition under supposition." The problem is not that one of these is right and the others wrong. The problem is that the distributional evidence cannot distinguish between them. The constraints are real; the interpretation is not forced. Worse: interpretive choices compound. If k is a determiner, then o must be a head. If o is a head, then qo is a noun phrase. If qok is a determined noun phrase, then f must be a relative determiner. Each step feels plausible, but the whole chain rests on the first assumption. Choose a different starting point and you get a different system.
The medieval logicians had a phrase for this kind of contextual constraint: "Talia sunt subiecta qualia permiserint praedicata" ("The subjects are such as the predicates permit"). What can appear is governed by what surrounds it. That describes Voynichese rather well. But it doesn't tell us whether we're looking at grammar, or memory architecture, or something else entirely. There's also the curious fact that Voynichese seems to lack things we'd expect from natural language: no clear tense or aspect marking, no subject-predicate agreement, no conjunctions, no obvious truth-conditional structure. It has positional constraints without syntactic roles, paradigmatic contrasts without lexical diversity, optional length markers that seem to tune mode rather than content.
So I find myself left with a fairly basic question: How should we talk about Voynichese words or tokens at all?
Are they words? Clauses? Records? Control sequences? Loci in a memory system? Terms under modes of supposition? Or are these all just isomorphic descriptions of the same underlying structural observations, each wearing the vocabulary of a different discipline? My concern is that many similar observations may already exist across the forum, scattered under different theories (linguistic, cryptographic, logical, mnemonic), each with its own semantic framing. And that makes it hard to see what is actually shared versus what is interpretation.
I'd be very interested in thoughts on:
- how to discuss these regularities without prematurely committing to meaning,
- whether there's a neutral descriptive vocabulary that's actually usable,
- what minimal properties any explanation of Voynichese must account for, regardless of theory,
- and whether the isomorphism between these frameworks is itself telling us something.
Cheers
and doesn't really give you any benefit, just like calling them tokens, items, letter groups or whatever.![[Image: %D0%9B%D0%B8%D1%88%D0%B8%D1%88%D1%8C_in_...ursive.jpg]](https://upload.wikimedia.org/wikipedia/commons/3/34/%D0%9B%D0%B8%D1%88%D0%B8%D1%88%D1%8C_in_russian_cursive.jpg)
And I won't be able to recreate it unless I memorise it. You will probably have the same.