The Voynich Ninja
What are Voynichese words? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: What are Voynichese words? (/thread-5336.html)

Pages: 1 2


RE: What are Voynichese words? - vosreth - 06-02-2026

Thanks everyone, I already found a few interesting threads worth picking up.

First, to clarify what I was getting at: I agree we don't need to call them "vords". The question wasn't really about terminology. It was that these space-delimited units seem to have internal structure more like sentences than typical words. Sentences have openers and closers, paradigmatic choices, continuation dependencies. Words generally don't show these properties at the sub-word level, at least not to this degree. That's what I find odd.

(06-02-2026, 02:01 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.I usually treat the manuscript as a cipher, so for me all of these are glyph sequences that have no semantics of their own.
On the cipher framing: I agree that treating the glyphs as having no semantics of their own is reasonable. But the structural question remains regardless. I am curious whether you see these patterns arising from a system that operates before encoding (a language, notation, or formal system), or from one that operates during encoding? And what properties would such a system need in order to reproduce the observed boundary and continuation effects?

(06-02-2026, 11:02 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.It and  i are the only strokes that repeat. Many words have the format of starting as a  e stroke string and continuing as an  i stroke string. I mentioned something about this in previous posts [ You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. ]. My personal conviction is that it is just a fabrication. An easy way for the writer to construct meaningless text.
dashstofsk's stroke-repetition observations are indeed something to note, and Bluetoes101's transition rules formalise similar intuitions. These are genuinely interesting frameworks. But I'm not sure "easy to repeat" accounts for everything. Take the e/ee/eee pattern: single e follows ch/sh about 63% of the time, ee only 28%, and eee just 9%. The environment shifts systematically as length increases. If this were simply about ease of repetition, why would longer chains actively avoid appearing after ch/sh? A grammatical analogy might be a derivational gradient ("quickly" → "quick" → "quickness") where longer forms occupy different structural positions. What would the ergonomic account predict here?

On context-sensitivity, here's something rather striking. Comparing different text types across the whole manuscript:
Labels: q 1.0%, standalone s 15.5%
Paragraphs: q 15.8%, standalone s 6.5%
Circular text: q 2.0%, standalone s 12.9%
Radial text: q 5.0%, standalone s 15.9%

The pattern holds across every section. Non-paragraph contexts suppress q and elevate s. One way to think about it: labels don't need to introduce new referents (the visual already shows you what's being labelled) but may need more anchoring to what's already established. Though of course that's just one possible reading.

The e-gradient is also consistent across both Currier languages:
Currier A: e 57% → ee 34% → eee 9% after ch/sh
Currier B: e 65% → ee 27% → eee 10% after ch/sh

The absolute frequencies differ (Currier B has more d, famously) but the structural gradient has the same shape. So whatever system produces these patterns, it appears stable across sections, text types, and Currier languages. The frequencies vary, but the rules don't.

Rafal asks about proof of meaningfulness. I suspect "meaningful or not" may be less useful than asking what generative system, if any, produces these constraints. Even a hoax needs a procedure that respects the regularities. And if that procedure is consistent enough to show the same gradients in Currier A and B, the same q/s asymmetry across all text types, that's a fairly disciplined hoax.


RE: What are Voynichese words? - oshfdk - 06-02-2026

(06-02-2026, 08:20 PM)vosreth Wrote: You are not allowed to view links. Register or Login to view.On the cipher framing: I agree that treating the glyphs as having no semantics of their own is reasonable. But the structural question remains regardless. I am curious whether you see these patterns arising from a system that operates before encoding (a language, notation, or formal system), or from one that operates during encoding?

The latter, I expect the underlying plaintext to be quite normal descriptive language, not a poem, spells or charms. The presence of a charm on the last page of a book was, as far as I understand, one of the cultural norms of the time, so I don't think we can extrapolate from You are not allowed to view links. Register or Login to view. to the rest of the MS. The structure of the text itself is quite normal - there are paragraphs of various lengths with short trailing lines, there are labels of different lengths. There is no obvious visual rhythmic structure to the way paragraphs are written, so I would rather bet on some structurally unremarkable plaintext encoded in a way that produces these strange patterns.

Of course, I don't know if the above is correct or not, but this is the working hypothesis I take.

(06-02-2026, 08:20 PM)vosreth Wrote: You are not allowed to view links. Register or Login to view.And what properties would such a system need in order to reproduce the observed boundary and continuation effects?

I'm not sure this is the right question, I believe there are many totally different ways to create these patterns. This is I think the approach @magnesium took with the Naibbe cipher - recreating some of the statistical properties of Voynichese in a plausible way. Does this bring us closer to understanding how the Voynich Manuscript was created? I don't know.

One thing that is almost certain is that if Voynichese is a cipher, it's a one to many cipher, one that allows encoding the same plaintext in many possible ways.


RE: What are Voynichese words? - dashstofsk - 07-02-2026

(06-02-2026, 08:20 PM)vosreth Wrote: You are not allowed to view links. Register or Login to view.Labels: q 1.0%, standalone s 15.5%
Paragraphs: q 15.8%, standalone s 6.5%
Circular text: q 2.0%, standalone s 12.9%
Radial text: q 5.0%, standalone s 15.9%


I can see a possible explanation for the differences between paragraph text and non-paragraph text ( labels, radials, circulars ). And it is once again consistent with the hypothesis that the manuscript is meaningless.

Labels, radial text, circular text would have been written after the drawings were completed, when a page was nearly completed. Radials and circulars would also have needed the writer to turn the page.

The writer, perhaps because drawing was not his forte or because the need for turning broke the momentum of his work, may have been in a different mindset when the time came to do the labels, radials, circulars. Perhaps he just did not have the same motivation for the astrological charts, and turned to the comfort of easy-to-write words. It is a psychological trick. When we are annoyed we do things differently.

Paragraph text, however, was less complicated. The writer just sat and wrote, line after line, a continuous stream without stopping, a whole page in one uninterrupted sitting, not having to shift and turn.

It is only the meaningful hypothesist who is troubled by this: "Words and spelling and text should be uniform everywhere, in paragraphs, labels, radials, circulars, but it is clearly not so. Why, why?"


RE: What are Voynichese words? - vosreth - 07-02-2026

(07-02-2026, 09:26 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.It is only the meaningful hypothesist who is troubled by this: "Words and spelling and text should be uniform everywhere, in paragraphs, labels, radials, circulars, but it is clearly not so. Why, why?"

Perhaps I should step back from the "meaningful or meaningless" framing, because I think it conflates several rather different things.

Consider: a medieval logician writes qokeedy to mark determinate supposition, daiin for distributed. Perfectly systematic. But unless you know what terms those markers attach to, the content can't be deciphered, is lost, and therefore meaningless. Or another analogy; instructions for navigating a memory palace: "At the third arch, place the object." Grammatical, rule-governed, but completely meaningless to anyone else than one person.

So "meaningless" could mean for example:
a) Random noise
b) Systematic notation whose referents we lack
c) Deliberate deception

These different interpretations predict different statistics.

(06-02-2026, 11:01 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.
(06-02-2026, 08:20 PM)vosreth Wrote: You are not allowed to view links. Register or Login to view.And what properties would such a system need in order to reproduce the observed boundary and continuation effects?

I'm not sure this is the right question, I believe there are many totally different ways to create these patterns. This is I think the approach @magnesium took with the Naibbe cipher - recreating some of the statistical properties of Voynichese in a plausible way. Does this bring us closer to understanding how the Voynich Manuscript was created? I don't know.

One thing that is almost certain is that if Voynichese is a cipher, it's a one to many cipher, one that allows encoding the same plaintext in many possible ways.

Wouldn't that imply that these properties have to be deliberately implemented in the encoding mechanism unless they arise from the plaintext?
For me it seems more likely that the text is encoded via many-to-one cipher, for example all nouns collapsing to generic o.

I ran one of Greshko's reference Naibbe ciphertexts through similar analysis. Naibbe encrypts letter-by-letter with random table selection, implementing the Zattera slot grammar for word-internal structure. Crucially, these reference texts encrypt real prose: Dante's Divina Commedia, Pliny's Natural History, Grosseteste's De sphaera, and a medieval alchemical herbal. Meaningful Latin and Italian, not gibberish.

Results (naibbe_Cleaned_52_07_10_word_lines.txt from You are not allowed to view links. Register or Login to view.):

Glyph-position within words:
- q at word-start: VMS 98.8%, Naibbe 100%
- qo: VMS 97.6%, Naibbe 98.9%
Both match. That's the slot grammar doing its job.

Cross-word transitions:
- VMS q: 24.4% After y-final, 8.2% after l-final
- VMS ch/sh: 17.4% After y-final, 32.0% after l-final
- Naibbe q: 19.1% After y-final, 20.2% after l-final
- Naibbe ch/sh: 17.4% after y-final, 17.4% after l-final
The ch/sh:q ratio after y-final is 0.71 in VMS, 0.91 in Naibbe. After l-final it's 3.88 in VMS, 0.86 in Naibbe.

VMS shows a fivefold shift depending on what came before. Naibbe is flat as Norfolk. Why? Naibbe doesn't know what it just wrote. Each word is encrypted fresh. The randomness washes everything out. And here's the rub: meaningful plaintext doesn't help. Dante's tercets and Pliny's botanical observations go in, flat distributions come out. The content doesn't shine through.

The e-gradient:
Same story. After ch/sh contexts:
- Naibbe: e 52% → ee 31% → eee 42% (no gradient; eee actually rises)
- VMS: e 65% → ee 27% → eee 10% (monotonic decrease)

What this suggests:
The VMS wasn't generated by purely local random encryption. It shows cross-word dependencies and systematic gradients that don't emerge from Naibbe-style processing, even with structured plaintext underneath. This doesn't tell us what the VMS is. But it rules out option (a). Whatever system produced it tracked context somehow. Whether that's a sophisticated cipher, a compositional system, or something else remains gloriously open.

(07-02-2026, 09:26 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I can see a possible explanation for the differences between paragraph text and non-paragraph text ( labels, radials, circulars ). And it is once again consistent with the hypothesis that the manuscript is meaningless.

Labels, radial text, circular text would have been written after the drawings were completed, when a page was nearly completed. Radials and circulars would also have needed the writer to turn the page.

The writer, perhaps because drawing was not his forte or because the need for turning broke the momentum of his work, may have been in a different mindset when the time came to do the labels, radials, circulars. Perhaps he just did not have the same motivation for the astrological charts, and turned to the comfort of easy-to-write words. It is a psychological trick. When we are annoyed we do things differently.

Paragraph text, however, was less complicated. The writer just sat and wrote, line after line, a continuous stream without stopping, a whole page in one uninterrupted sitting, not having to shift and turn.

On scribe psychology: Naibbe shows what "local choices without context tracking" looks like. Flat. The VMS patterns hold across sections and hands. That's hard to attribute to mood, unless the mood was uncommonly methodical. If you subscribe to multiple scribes, this becomes even more uncommonly methodical.


RE: What are Voynichese words? - Rafal - 07-02-2026

Quote:VMS shows a fivefold shift depending on what came before. Naibbe is flat as Norfolk. Why? Naibbe doesn't know what it just wrote. Each word is encrypted fresh.

Naibbe is a good attempt to create cipher that gives ciphertext looking like Voynich. But Voynich Manuscript isn't encrypted with Naibbe and most probably isn't encrypted with something like Naibbe.

Naibbe is a complicated and very verbose cipher. To get one letter of plaintext sometimes you need to write even 6-7 Voynich symbols. It would make the source text very short and labels would be 1-2 letters long. I can't believe someone would use something like that to encrypt his text.

And when you get into details VM behaves differently in some areas. As you say first letter of a word statistically depends on the last letter of the previous word.
Emma Smith talks about it here: You are not allowed to view links. Register or Login to view.

It's a very strange behavior which doesn't appear in any natural language that I can think of.