The Voynich Ninja
Proposed architecture of the Voynich system - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Proposed architecture of the Voynich system (/thread-5765.html)



Proposed architecture of the Voynich system - Labyrinthinesecurity - 21-05-2026

The IVTFF cleanly separates captions (next to drawings) from the rest of the text.

I thought that captions could be semantic rich, and maybe different from the rest of the text from a grammar perspective? 

Looks like they are.

Captions follow the following pattern, overwhelmingly: o-K-V-F (o + stop + vowel + final consonant), repeated 1-2 times. The ch/sh and e slots are used sparingly. What's more, captions dont carry the A/B distinction signal.

So... what if captions where surfacing semantically meaningful words, whereas words not containing o-V-K-F where just "elaboration"?

We would then have two channels in Voynich.

I ran some stats, and look at the results:
Semantic channel (o, a, t, k, p, f, d, l, r, n, m, y, s): stable across sections, preserved in captions, 73% of all glyphs
Elaboration channel (ch, sh, e, ee, q, i, ii, cXh): varies by line position, varies by A/B "language," largely absent from captions.

Semantic words have 29% redundancy => reasonable, close to natural language 
Elaboration has only 3.2% redundancy => it's essentially memoryless. It barely depends on the previous elaboration. This is consistent with elaboration being either random padding, a simple positional marker, or an independent cipher layer.

Entropy Comparison
Each Voynich word carries approximately:
  • 7.73 bits of semantic core information (the message)
  • 2.78 bits of elaboration information (position + dialect + some morphology)
  • Total: 10.51 bits per token

The elaboration's 2.78 bits decompose further into:
  • PREFIX (~1.76 bits): primarily encodes line position (ch/sh at start, ∅ at end, q in middle)
  • INFIX (~3.38 bits): encodes section dialect and some core-specific morphology
These two sub-channels share only 0.147 bits of mutual information (8.4% of prefix entropy), they are nearly independent.

Let me make two conjectures: 
1) the Voynich has two layers: a semantic core (73% of glyphs, 49% of vocabulary) and an elaboration layer (27% of glyphs, carrying almost no sequential information)
2) proposed word architecture: [ELAB_PREFIX] + [SEMANTICAL_PREFIX] + [SEMANTICAL_STEM] + [SEMANTICAL_SUFFIX] + [ELAB_INFIX] + ...
                                             ch/sh/q        o/a/∅        k/da/ka/...    y/n/l/r/m/∅      ii/ee/e/i 

Example decompositions:
Full word Elabprefix SemPrefix Stem         SemSuffix Elabinfix
chol        ch            o              —             l              —      
okaiin      —            o              ka            —             ii ...
daiin        —           ∅              da            —            ii ...
shedy      sh           ∅               —           —            e + (d=stem, y=suffix)
qokeedy   q            o                k            —             ee ...

Thoughts?


RE: Proposed architecture of the Voynich system - rikforto - 21-05-2026

(21-05-2026, 04:43 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.o + stop + vowel + final consonant

I would be mighty curious how you figured out phonemic features here as that is a much more consequential claim than another sketched word grammar


RE: Proposed architecture of the Voynich system - eggyk - 21-05-2026

(21-05-2026, 04:43 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Elaboration channel (ch, sh, e, ee, q, i, ii, cXh): varies by line position, varies by A/B "language," largely absent from captions.

Unless I'm misunderstanding what constitutes a label, is this not simply untrue? 

Here is f68r2 with those characters highlighted:

   


RE: Proposed architecture of the Voynich system - Labyrinthinesecurity - 21-05-2026

(21-05-2026, 07:15 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.
(21-05-2026, 04:43 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.Elaboration channel (ch, sh, e, ee, q, i, ii, cXh): varies by line position, varies by A/B "language," largely absent from captions.

Unless I'm misunderstanding what constitutes a label, is this not simply untrue? 

Here is f68r2 with those characters highlighted:

as I said: The ch/sh and e slots are used sparingly. ir doesnt mean never. this page is an (very interesting) exception. in all these words ch is an elaboration prefix if the word architecture is to be believed


RE: Proposed architecture of the Voynich system - eggyk - 21-05-2026

(21-05-2026, 08:27 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.as I said: The ch/sh and e slots are used sparingly. ir doesnt mean never. this page is an (very interesting) exception. in all these words ch is an elaboration prefix if the word architecture is to be believed

Is it an exception? Which pages actually follow what you are saying? The only pages I found that seem to follow that were on f101v. 

From what i can see ch, sh, e, ii are all very common in labels across the entire manuscript. Just type ch or sh into voynichese.com and it becomes quite clear.


RE: Proposed architecture of the Voynich system - Labyrinthinesecurity - 21-05-2026

eggyk dateline='[url=tel:1779394311' Wrote: You are not allowed to view links. Register or Login to view.1779394311[/url]']
Labyrinthinesecurity dateline='[url=tel:1779391646' Wrote: You are not allowed to view links. Register or Login to view.1779391646[/url]']
as I said: The ch/sh and e slots are used sparingly. ir doesnt mean never. this page is an (very interesting) exception. in all these words ch is an elaboration prefix if the word architecture is to be believed

Is it an exception? Which pages actually follow what you are saying? The only pages I found that seem to follow that were on f101v. 

From what i can see ch, sh, e, ii are all very common in labels across the entire manuscript. Just type ch or sh into voynichese.com and it becomes quite clear.

I should have made it clear that the A/B markers Im mentionning are sh/ch followed by o or e. these are only present in about 10% of the labels. but the word architecture is definitely wrong: the elaboration should be che,cho,she or sho not just ch or sh.

Why these 4 markers? because they predict 96% of the currier language in each folio.


RE: Proposed architecture of the Voynich system - Dunsel - 21-05-2026

(21-05-2026, 04:43 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.I ran some stats, and look at the results:
Semantic channel (o, a, t, k, p, f, d, l, r, n, m, y, s): stable across sections, preserved in captions, 73% of all glyphs
Elaboration channel (ch, sh, e, ee, q, i, ii, cXh): varies by line position, varies by A/B "language," largely absent from captions.

Thoughts?

If you've read my current post you know I'm working on a sheet based copy/mutate theory.  I found your idea curious so I fired up codex and using some of my data, I tested your findings across the astrologic section.  I tested this on pages (f67–f73) and can confirm the same general pattern. Labels/captions are dramatically more “core-heavy” and contain far less of the proposed elaboration layer than running text. The same effect also appears manuscript-wide.

Subset Tokens Semantic % Elaboration %
Astrological labels 496 82.6% 14.9%
Astrological running text 2,130 73.7% 24.7%
All manuscript labels 575 83.0% 14.5%
All running text 32,982 70.4% 26.6%

Then I tested the proposed “core vs elaboration” idea by stripping the suggested elaboration glyphs/sequences (ch, sh, q, e, ee, i, ii) from the text and rerunning structural comparisons on Scribe 1.  The stripped forms actually collapsed MORE cleanly onto prior material. Exact/prior coverage increased substantially, edit distance 1 recovery improved, the number of source sheets needed dropped, and label vocabulary became noticeably closer to running text.

Test Raw Stripped
Scribe 1 exact/prior coverage 67.8% 80.5%
Scribe 1 exact+ED1 coverage 92.2% 96.5%
Sheets needed for 80% token coverage 3 2
Label/running-text vocabulary overlap 47.1% 56.9%

I don't think this proves semantics. But it does suggest there may really be a relatively stable underlying word-family backbone underneath a more variable elaboration layer.

And, that fits quite well with the constrained copy/mutate production process that I'm seeing. The “core” may simply represent inherited word-family anchors used during copying. While elaboration represents local mutation freedom.  In other words, the scribes are not preserving semantics, they are preserving recognizable production continuity.  That's a working theory mind you...

So, very interesting find,


RE: Proposed architecture of the Voynich system - eggyk - 21-05-2026

(21-05-2026, 09:41 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.I should have made it clear that the A/B markers Im mentionning are sh/ch followed by o or e. these are only present in about 10% of the labels. but the word architecture is definitely wrong: the elaboration should be che,cho,she or sho not just ch or sh.

Why these 4 markers? because they predict 96% of the currier language in each folio.

What about chc? That's the most important to explain.
You stated that the "elaboration channel" is sparingly used in labels, but the symbols you associate with that channel are common in labels. So I still don't understand.


RE: Proposed architecture of the Voynich system - ReneZ - 21-05-2026

Using cho vs. che as a way to distinguish between Currier A and B is an oversimplification, and it is incomplete.
Words beginning cheo (or Sheo) are common in A language.
Caption words (more commonly called labels) have a much lower use of initial ch .
That is a long-known feature of the text (though not properly understood ... ).



RE: Proposed architecture of the Voynich system - Labyrinthinesecurity - 22-05-2026

ReneZ dateline='[url=tel:1779403343' Wrote: You are not allowed to view links. Register or Login to view.1779403343[/url]']
Using cho vs. che as a way to distinguish between Currier A and B is an oversimplification, and it is incomplete.
Words beginning cheo (or Sheo) are common in A language.
Caption words (more commonly called labels) have a much lower use of initial ch .
That is a long-known feature of the text (though not properly understood ... ).

its well known but is it exploited? id be curious to know, because that would help a lot. I propose to use this weird labels structure as a hint for 2 channels separation