10-03-2026, 07:28 AM
Hi all,
My name is Takeshi, and I’m part of a university research group in Japan. I wanted to ask whether anyone here has explored something similar, or whether there are older threads or papers I should know before going further.
I’m not working on decipherment, and I’m not claiming that positional structure in Voynichese is new. My question is narrower.
I’ve been testing whether a reduced family-based representation can recover local structure better than flat EVA-token sequences.
At the moment I’m grouping tokens into a small set of recurring families, mainly:
AIIN
CHEDY
CHOL
QOKAIN
QOKE
plus a residual OTHER class
Then I look at local windows of four adjacent positions, asking whether the order of families across the four slots captures short-range structure more clearly than raw EVA alone.
Very cautiously, the reason I think this may be worth pursuing is that a few things seem to hold:
some local classes show at least modest out-of-sample recoverability
ablation suggests the family-positional layer may carry more signal than raw EVA alone
some classes remain visible under multiclass classification
and most importantly, unsupervised clustering of the strongest windows shows partial alignment with those inferred classes
What I found especially interesting is that this alignment weakens sharply under slot-wise shuffling, even when slot marginals are preserved. So the signal may lie less in simple family frequency than in cross-slot combinatorial order.
I’m also starting to wonder whether some of these recurrent window types may support very cautious functional descriptions — not meanings, but local roles such as compact unit, opening, development, turn, or closure-like transition. But I want to be careful not to overread this. What seems interesting is that the signal weakens a lot when the order of families inside the four-slot window is shuffled. This suggests that what matters is not only which families appear, but also the local order in which they appear.
At the moment I also have the impression that the strongest short-range sequential structure may be denser in recipe-like material than in botanical material, though I’m still unsure whether that reflects a real difference or just a bias introduced by the current windowing choice.
So I wanted to ask:
Has anyone here tried something like family reduction + short local windows before?
Do four-slot windows sound like a reasonable exploratory unit, even if only provisionally?
Does the possible recipe / botanical asymmetry sound plausible, or more likely methodological?
To be clear, I’m not claiming decipherment, lexical values, or a full model of the manuscript. At most, I think this may be recovering a limited layer of local combinatorial structure. What we seem to have here is a small but recoverable layer of local structure: not a decipherment, but recurring short-range family patterns that appear to matter more in their order than in their simple frequency.
If anyone knows relevant prior work, older forum discussions, or obvious pitfalls in this approach, I’d be very grateful. And of course contact if you want to discuss about this!
Thank you and nice to meet you all!
Take
My name is Takeshi, and I’m part of a university research group in Japan. I wanted to ask whether anyone here has explored something similar, or whether there are older threads or papers I should know before going further.
I’m not working on decipherment, and I’m not claiming that positional structure in Voynichese is new. My question is narrower.
I’ve been testing whether a reduced family-based representation can recover local structure better than flat EVA-token sequences.
At the moment I’m grouping tokens into a small set of recurring families, mainly:
AIIN
CHEDY
CHOL
QOKAIN
QOKE
plus a residual OTHER class
Then I look at local windows of four adjacent positions, asking whether the order of families across the four slots captures short-range structure more clearly than raw EVA alone.
Very cautiously, the reason I think this may be worth pursuing is that a few things seem to hold:
some local classes show at least modest out-of-sample recoverability
ablation suggests the family-positional layer may carry more signal than raw EVA alone
some classes remain visible under multiclass classification
and most importantly, unsupervised clustering of the strongest windows shows partial alignment with those inferred classes
What I found especially interesting is that this alignment weakens sharply under slot-wise shuffling, even when slot marginals are preserved. So the signal may lie less in simple family frequency than in cross-slot combinatorial order.
I’m also starting to wonder whether some of these recurrent window types may support very cautious functional descriptions — not meanings, but local roles such as compact unit, opening, development, turn, or closure-like transition. But I want to be careful not to overread this. What seems interesting is that the signal weakens a lot when the order of families inside the four-slot window is shuffled. This suggests that what matters is not only which families appear, but also the local order in which they appear.
At the moment I also have the impression that the strongest short-range sequential structure may be denser in recipe-like material than in botanical material, though I’m still unsure whether that reflects a real difference or just a bias introduced by the current windowing choice.
So I wanted to ask:
Has anyone here tried something like family reduction + short local windows before?
Do four-slot windows sound like a reasonable exploratory unit, even if only provisionally?
Does the possible recipe / botanical asymmetry sound plausible, or more likely methodological?
To be clear, I’m not claiming decipherment, lexical values, or a full model of the manuscript. At most, I think this may be recovering a limited layer of local combinatorial structure. What we seem to have here is a small but recoverable layer of local structure: not a decipherment, but recurring short-range family patterns that appear to matter more in their order than in their simple frequency.
If anyone knows relevant prior work, older forum discussions, or obvious pitfalls in this approach, I’d be very grateful. And of course contact if you want to discuss about this!
Thank you and nice to meet you all!
Take