The Voynich Ninja - An Artificial Construction

Pages: 1 2 3 4 5 6 7

We're talking about writing, rather than language. The character shapes may have been adapted at word ends, and the rules may just reflect that (to some extent).

There is quite a bit of circumstantial (non-compelling) evidence that such writing features are in play.
n really looks like a word-final version of i
I would not be at all surprised if r turned out to be the same.
m and g are almost certainly special shapes that prefer line ends.
The precise meaning of f and p still elude us.

These last two have nothing to do with word spaces, but fit the general suggestion of 'writing features' .

That is indeed a Problem, too and there a many more Problem like this --

(16-05-2026, 05:21 AM)JoJo_Jost Wrote: You are not allowed to view links. Register or Login to view.I don't mean to be a pain, but that doesn't solve the problem of word boundaries.... I think this applies to Chinese characters and, of course, VMS, because the spaces here are most likely not normal spaces—at least not all of them... So what exactly are u comparing here? That's the interesting question...

This is a good question. The approach I took should work regardless of the script conventions used. Since it can match Chinese characters (no spaces, non phonetic) and pinyin (phonetic, with spaces), it should work regardless of what the spaces and abbreviations are, as long as they are used according to some universal scheme and not randomly. If the spaces are predictable they will be merged with other tokens.

Here's side by side the computation run on an English text and on Voyniches with spaces and with all spaces removed before processing. As you can see, for the most frequent 5 token combinations there is no much difference. English shows a lot of unbalanced prefix-suffix pairs (mostly just missing from the text), Voynichese only shows a few.

[attachment=15601]

In simple terms this means that in most natural languages if you takes some of the most common tokens (for example, "and", "the", "ing") they won't normally appear in texts in all kinds of pairwise combinations. There will be no or little "the and" or "and ing" or "the ing", even though the reverse combinations would be frequent. There would be little of "the the", "ing ing". According to the charts, this is observed in English, Latin, Arabic, Chinese expressed as characters, Chinese expressed as pinyin, English with spaces removed. However, this is not observed in Voynichese, spaces or no spaces. If there is some very popular token like daiin or chol, there would be some chol.daiin and there would be some daiin.chol in the text.

The only difference between this example and the real algorithm is that the text is split into tokens using a variation of BPE algorithm and not relying on any predefined boundaries (like spaces or anything else).

Another thing that follows from this, that I think I already mentioned somewhere is that I don't think Voynichese can be described as "repetitive". It's just the natural languages are "antirepetitive", while Voynichese is better described as random. We don't see instances of "the the the" in English, because English tokens are not selected independently of one another, there would absolutely be examples of "the the the" and "and and and" if we just pulled English words from a shuffled deck.

(16-05-2026, 05:56 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Another thing that follows from this, that I think I already mentioned somewhere is that I don't think Voynichese can be described as "repetitive". It's just the natural languages are "antirepetitive", while Voynichese is better described as random. We don't see instances of "the the the" in English, because English tokens are not selected independently of one another, there would absolutely be examples of "the the the" and "and and and" if we just pulled English words from a shuffled deck.

Voynichese is also "antirepetitive": there are more repetitions at distance 2 than 1. Voynichese seems mostly random, more random than languages, but not completely random, as shown by word pair statistics: "skewed pairs" etc.

(16-05-2026, 05:32 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If there is some very popular token like daiin or chol, there would be some chol.daiin and there would be some daiin.chol in the text.

That's right. And what's more, this cannot be fully explained with boustrophedon hypotheses, or some simple line reversal scheme, due to the vord length autocorrelation. Short-long word interspersion (observed in many candidate natural languages) should still hold if some lines were arbitrarily penned in reverse order.

(16-05-2026, 06:48 PM)RadioFM Wrote: You are not allowed to view links. Register or Login to view.That's right. And what's more, this cannot be fully explained with boustrophedon hypotheses, or some simple line reversal scheme, due to the vord length autocorrelation.

Why not? As you said line reversal has no impact on vord length autocorrelation.

(16-05-2026, 05:32 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.If there is some very popular token like daiin or chol, there would be some chol.daiin and there would be some daiin.chol in the text.

Often yes, but the frequencies are often skewed, more than in word-shuffled text, and there are counterexamples:
no chedy aiin
no chor Shol
no Shol chor

(16-05-2026, 07:19 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Often yes, but the frequencies are often skewed, more than in word-shuffled text, and there are counterexamples:

True. I'm not arguing it's perfectly random, it's just that individual tokens appear much more independent compared to any natural language, regardless of any assumptions about the word boundaries.

Pages: 1 2 3 4 5 6 7