The Voynich Ninja

Hi all,

My name is Takeshi, and I’m part of a university research group in Japan. I wanted to ask whether anyone here has explored something similar, or whether there are older threads or papers I should know before going further.

I’m not working on decipherment, and I’m not claiming that positional structure in Voynichese is new. My question is narrower.

I’ve been testing whether a reduced family-based representation can recover local structure better than flat EVA-token sequences.

At the moment I’m grouping tokens into a small set of recurring families, mainly:

AIIN
CHEDY
CHOL
QOKAIN
QOKE
plus a residual OTHER class

Then I look at local windows of four adjacent positions, asking whether the order of families across the four slots captures short-range structure more clearly than raw EVA alone.

Very cautiously, the reason I think this may be worth pursuing is that a few things seem to hold:

some local classes show at least modest out-of-sample recoverability
ablation suggests the family-positional layer may carry more signal than raw EVA alone
some classes remain visible under multiclass classification
and most importantly, unsupervised clustering of the strongest windows shows partial alignment with those inferred classes

What I found especially interesting is that this alignment weakens sharply under slot-wise shuffling, even when slot marginals are preserved. So the signal may lie less in simple family frequency than in cross-slot combinatorial order.

I’m also starting to wonder whether some of these recurrent window types may support very cautious functional descriptions — not meanings, but local roles such as compact unit, opening, development, turn, or closure-like transition. But I want to be careful not to overread this. What seems interesting is that the signal weakens a lot when the order of families inside the four-slot window is shuffled. This suggests that what matters is not only which families appear, but also the local order in which they appear.

At the moment I also have the impression that the strongest short-range sequential structure may be denser in recipe-like material than in botanical material, though I’m still unsure whether that reflects a real difference or just a bias introduced by the current windowing choice.

So I wanted to ask:

Has anyone here tried something like family reduction + short local windows before?
Do four-slot windows sound like a reasonable exploratory unit, even if only provisionally?
Does the possible recipe / botanical asymmetry sound plausible, or more likely methodological?

To be clear, I’m not claiming decipherment, lexical values, or a full model of the manuscript. At most, I think this may be recovering a limited layer of local combinatorial structure. What we seem to have here is a small but recoverable layer of local structure: not a decipherment, but recurring short-range family patterns that appear to matter more in their order than in their simple frequency.

If anyone knows relevant prior work, older forum discussions, or obvious pitfalls in this approach, I’d be very grateful. And of course contact if you want to discuss about this!

Thank you and nice to meet you all!

Take

This seems at least adjacent toTorsten Timm's diagrams of word connectivity.
These can be found here: You are not allowed to view links. Register or Login to view.

Looking at your short list, my quick impression is that QOKAIN seems a bit out of place, or at least referring to a much smaller group.

(10-03-2026, 08:14 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This seems at least adjacent toTorsten Timm's diagrams of word connectivity.
These can be found here: You are not allowed to view links. Register or Login to view.

Looking at your short list, my quick impression is that QOKAIN seems a bit out of place, or at least referring to a much smaller group.

Dear ReneZ,

Thanks, that’s a really helpful observation!

Yes, we were partly influenced by Timm’s work when thinking about local dependencies, so I agree that there is definitely some adjacency there.

But is not exactly the same thing. Our angle is a bit different in at least three ways: we are not mainly trying to model Voynichese as a word-generation or self-citation process; we are reducing tokens into family-level classes and looking at their positional behavior; and we are testing fixed short windows under several controls rather than focusing only on connectivity patterns as such.

So I’d say: adjacent, yes, but not equivalent.

And your point about QOKAIN is very interesting! It may mean that it really is a smaller and more marked family than the others, or it may mean that we are currently treating it too narrowly and that it should eventually be merged, redefined, or handled differently. We are still trying to understand that part better.

So far, what seems encouraging on our side is that the reduced family-positional layer appears to recover at least some local structure that becomes weaker when the order inside the window is disturbed. We’re still being cautious about what that means, but that is the direction that currently looks most promising to us. The pattern stills strong over all the script, even when we permute and put it under ablation process to avoid any ad hoc mistake.

Thanks again, this is exactly the kind of comment that helps us refine the approach!

Hi,

This is very interesting. I have a paper I'm finalising at the moment on exactly this point.

Concretely, it's a 4 slot model.

Draft is on Zenodo and I'm finalising editing at the moment.

Happy to share findings.

Thanks

Ed

Uroboros

ReneZ

Uroboros

DG97EEB