The Voynich Ninja
What are Voynichese words? - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: What are Voynichese words? (/thread-5336.html)

Pages: 1 2


What are Voynichese words? - vosreth - 06-02-2026

Dear friends,

I keep coming back to a question that won't quite leave me alone: how should we actually describe the structural behaviours we see in Voynichese "words"?
They're clearly not simple substitution outputs, and they're not random strings. That's been shown many times. But they don't behave like ordinary words in known languages either. They're constrained, but not in the ways we'd expect.

Here are some observations. I'll present the raw distributional facts first, then show why describing them is harder than it looks.
1. Rigid initial binding: The token q appears word-initially 99% of the time. It essentially never occurs mid-word or finally. When it appears, it selects o as its immediate follower at 97%. Whatever q is doing, it does it once, at the start, and it binds tightly to what follows.
2. Two boundary types with different continuations: Both y and l tend toward final positions, but they create different continuation environments. After y-final words, q appears at 27% and ch/sh appear at 17%. After l-final words, q appears at only 9% while ch/sh rise to 33%. So we have two "closers" that close in different ways. One favours continuation with q (which binds to o). The other favours continuation with ch and sh (which themselves open bounded structures).
3. Paradigmatic exclusion with ordering: The tokens k and t rarely appear together (under 2% co-occurrence). When they do co-occur, t precedes k at roughly 2:1. This looks like a paradigmatic contrast: two tokens competing for the same structural slot, with one tending to come first when both are present.
4. A suffix-like element changes selection rates: Compare k with f, or t with p. The f and p variants are followed by ch at 37–46%, compared with only 10–17% for k and t. Whatever f and p are, they license continuation with ch at three to four times the rate.
5. Reduplication creates gradients: The token e can appear singly, doubled, tripled, or quadrupled. As the chain lengthens, two things shift systematically. What precedes the chain changes: e follows ch/sh 63% of the time, ee only 28%, and eee just 9%. And what follows the chain changes: e is followed by s only 1% of the time, ee reaches 4%, and eee reaches 13%. This isn't free repetition. Longer chains shift from appearing after ch/sh to appearing after k/t, and increasingly close with s.
6. Structures can nest: Forms like ofchedy appear to embed one bounded structure inside another: the f selects ch, which opens a structure that closes with y, all sitting inside a larger unit beginning with o.

This is where it gets tricky: Each of these observations can be described in multiple ways. Take the first one. The observation is: q is 99% initial and selects o at 97%. A grammarian might call this "demonstrative binding to head." A mnemonist might call it "locus anchoring." A notationalist might call it "record initialisation." A medieval logician might call it "term introduction." Or take the nesting behaviour. The observation is: f selects ch...y inside o... A grammarian sees "relative clause inside noun phrase." A mnemonist sees "room within room." A notationalist sees "sub-record inside record." A logician sees "supposition under supposition." The problem is not that one of these is right and the others wrong. The problem is that the distributional evidence cannot distinguish between them. The constraints are real; the interpretation is not forced. Worse: interpretive choices compound. If k is a determiner, then o must be a head. If o is a head, then qo is a noun phrase. If qok is a determined noun phrase, then f must be a relative determiner. Each step feels plausible, but the whole chain rests on the first assumption. Choose a different starting point and you get a different system.

The medieval logicians had a phrase for this kind of contextual constraint: "Talia sunt subiecta qualia permiserint praedicata" ("The subjects are such as the predicates permit"). What can appear is governed by what surrounds it. That describes Voynichese rather well. But it doesn't tell us whether we're looking at grammar, or memory architecture, or something else entirely. There's also the curious fact that Voynichese seems to lack things we'd expect from natural language: no clear tense or aspect marking, no subject-predicate agreement, no conjunctions, no obvious truth-conditional structure. It has positional constraints without syntactic roles, paradigmatic contrasts without lexical diversity, optional length markers that seem to tune mode rather than content.

So I find myself left with a fairly basic question: How should we talk about Voynichese words or tokens at all?

Are they words? Clauses? Records? Control sequences? Loci in a memory system? Terms under modes of supposition? Or are these all just isomorphic descriptions of the same underlying structural observations, each wearing the vocabulary of a different discipline? My concern is that many similar observations may already exist across the forum, scattered under different theories (linguistic, cryptographic, logical, mnemonic), each with its own semantic framing. And that makes it hard to see what is actually shared versus what is interpretation.

I'd be very interested in thoughts on:
- how to discuss these regularities without prematurely committing to meaning,
- whether there's a neutral descriptive vocabulary that's actually usable,
- what minimal properties any explanation of Voynichese must account for, regardless of theory,
- and whether the isomorphism between these frameworks is itself telling us something.

Cheers


RE: What are Voynichese words? - oshfdk - 06-02-2026

Hi,

(06-02-2026, 01:34 AM)vosreth Wrote: You are not allowed to view links. Register or Login to view.I'd be very interested in thoughts on:
- how to discuss these regularities without prematurely committing to meaning,

I see no issue calling space separated chunks of text "words". For me this is no more committing to any meaning than calling addressable byte sequences "words" in computer memory architectures.

Some people use "vords" for these, I don't think this is needed really.

(06-02-2026, 01:34 AM)vosreth Wrote: You are not allowed to view links. Register or Login to view.- whether there's a neutral descriptive vocabulary that's actually usable,
- what minimal properties any explanation of Voynichese must account for, regardless of theory,
- and whether the isomorphism between these frameworks is itself telling us something.

I usually treat the manuscript as a cipher, so for me all of these are glyph sequences that have no semantics of their own.


RE: What are Voynichese words? - dashstofsk - 06-02-2026

(06-02-2026, 01:34 AM)vosreth Wrote: You are not allowed to view links. Register or Login to view.The token e can appear singly, doubled, tripled, or quadrupled.

It and  i are the only strokes that repeat. Many words have the format of starting as a  e stroke string and continuing as an  i stroke string. I mentioned something about this in previous posts [ You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. ]. My personal conviction is that it is just a fabrication. An easy way for the writer to construct meaningless text.


RE: What are Voynichese words? - dashstofsk - 06-02-2026

(06-02-2026, 01:34 AM)vosreth Wrote: You are not allowed to view links. Register or Login to view.Both y and l tend toward final positions

y and  l together with  q, r, n, s are characters that seem to prefer to come either at the start of a word or at the end. This is curious. But I can see a pattern, and I hope you will be able to see it also if you look how these letters are written. They are usually written ended with a marked downstroke or with a backswing. And if you look at the words in the manuscript you will see plenty of this: first and last characters exaggerated, characters mid-word not so much.


                           


It suggests to me that this is just a preference of the writer. He likes to start a word with a bang, then continues with simple characters, then likes to end with a bang. Having to write a character with an exaggeration mid-word would have the effect of upsetting the momentum of the writing.

And do you wonder why  q and  f never come at the end? I have an idea why this is so. It is because these characters are awkward to write. Too many strokes of the pen. Once the writer is approaching the end of a word he wants to finish it as quickly as possible and then move on. It is just how he is driven.

This is all consistent with the hypothesis that the manuscript is meaningless, a hoax, an artificial construction. The writer, having pages of bogus text to write, would have wanted to simplify the task. And either intentionally or by accident adopted a writing style that made the task of writing less demanding.


RE: What are Voynichese words? - eggyk - 06-02-2026

(06-02-2026, 11:02 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.
(06-02-2026, 01:34 AM)vosreth Wrote: You are not allowed to view links. Register or Login to view.The token e can appear singly, doubled, tripled, or quadrupled.

It and  i are the only strokes that repeat. Many words have the format of starting as a  e stroke string and continuing as an  i stroke string. I mentioned something about this in previous posts [ You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. ]. My personal conviction is that it is just a fabrication. An easy way for the writer to construct meaningless text.

I find it noteworthy that the only strokes that repeat happen to be very common sequences of repeated similar strokes within medieval manuscripts. (m,n,i,u / e,c,t,r) The voynich author could have repeated any type of fake symbol over and over to fabricate text but settled on the common symbols that commonly have multiple meanings. 

       

Instead of something like this:
   


RE: What are Voynichese words? - nablator - 06-02-2026

(06-02-2026, 12:38 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Instead of something like this:

That's exactly what Luigi Serafini chose.


RE: What are Voynichese words? - Rafal - 06-02-2026

I would just call them words.
As it was said some people call them "vords" but for me it is "veird"  Wink and doesn't really give you any benefit, just like calling them tokens, items, letter groups or whatever.

They were certainly designed to look like words but in theory it could be a trick. They could be just syllables, spaces could be unimportant or be yet another symbol in the cipher. People explored these ideas but it didn't really give any fruits.

Quote:what minimal properties any explanation of Voynichese must account for, regardless of theory,

It should be consistent and universal for all the text. You cannot give reading of 20 or even 100 words and ignore the fact that your system doesn't work for the rest of words.

And another thing is claiming that the text is meaningless. While some people suspect it, we don't have an idea what could be the definitive proof.


RE: What are Voynichese words? - agalakhov - 06-02-2026

(06-02-2026, 11:02 AM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.It and  i are the only strokes that repeat. Many words have the format of starting as a  e stroke string and continuing as an  i stroke string. I mentioned something about this in previous posts [ You are not allowed to view links. Register or Login to view. , You are not allowed to view links. Register or Login to view. ]. My personal conviction is that it is just a fabrication. An easy way for the writer to construct meaningless text.
Or they maybe parts of more complex glyphs. For example, even modern Russian cursive has similar properties:

[Image: %D0%9B%D0%B8%D1%88%D0%B8%D1%88%D1%8C_in_...ursive.jpg]
A similar medieval example is provided by You are not allowed to view links. Register or Login to view.
While I don't agree with Prof. Bax in general, I believe he was right about existence of two distinct glyphs, i and ii, and that the combination iin actually consists of two symbols, not three.


RE: What are Voynichese words? - dashstofsk - 06-02-2026

(06-02-2026, 01:49 PM)Rafal Wrote: You are not allowed to view links. Register or Login to view.the definitive proof


The text of the manuscript is so odd that it is clear that there will never be proof. There are just too many oddities, irregularities, anomalies.

We need instead to look at what is logical, possible, probable, plausible.

It is my belief that people today are thinking too hard about the manuscript. The solution is probably simple.

It is easier for those who think that it is meaningless. The meaningful hypothesist however has to take the anomalies and try to find meaning for them. Every oddity will have to have some explanation. Perhaps the transliteration was wrong? Perhaps the writer made a mistake? Is it in some known language? In shorthand, or coded? What is the grammar, syntax? Is it continuous narrative, prose, an inventory, incantations? Where did it come from?

But the meaningless hypothesist is not troubled by such questions. The writers did not need to be too precise with their 'method'. Knowing that no-one would ever understand the text, and so long as it did not deviate from the goal of making the text appear genuine, they could fill the text with peculiarities, misspellings, misformed letters and their personal foibles. Their method didn't have to be perfect, correct, the best. It just had to be good enough. The 'mistakes' seemed not to have troubled them or been obvious to their patrons.


RE: What are Voynichese words? - Rafal - 06-02-2026

Quote:But the meaningless hypothesist is not troubled by such questions. The writers did not need to be too precise with their 'method'.

I agree.
If I say that is is gibberish, people may oppose that it is not random.
Then I could say that it is "structured gibberish".
Then people could ask me to show the method, the algorithm used to create the text.

But probably there wasn't any precise method. The author could have some tricks but use them inconsistently. Sometimes he followed the guidelines, sometimes he improvised. For example sometimes he copied a word from prepared list of common words, sometimes copied neighbour word, sometimes altered and copied neighbour word and sometimes created a new word. And we are unable to recreate this process.

You may do a little personal experiment. Try to write down some gibberish.
Here is a sample of my gibberish for you:
botol aranan sedu alpero exim estrano bonas aquello nes padis manu limeli parde sem ater fennis

I have no idea myself why I created such words and no others  Smile And I won't be able to recreate it unless I memorise it. You will probably have the same.