The Voynich Ninja

Pages: 1 2

(27-11-2025, 01:24 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.My open-access paper describing the Naibbe cipher is now out in Cryptologia: You are not allowed to view links. Register or Login to view.

Thank you all very much for letting me present the cipher at VMD and for all your insightful feedback on the forum. It greatly improved the final paper.

Hello Michael, here are my thoughts about your interesting paper.

OVERVIEW

This paper attempts to investigate whether the Voynich Manuscript (VMS) could be a ciphertext by developing what the author calls the "Naibbe cipher"—a verbose homophonic substitution cipher designed to encrypt Latin and Italian text while reproducing VMS statistical properties. While the work demonstrates creativity and shows that certain VMS properties can be replicated through deliberate cipher design, it suffers from fundamental methodological flaws, circular reasoning, internal contradictions, and incomplete engagement with VMS properties.

PRINCIPAL FINDINGS

What the Paper Demonstrates:

A verbose homophonic substitution cipher can be designed to match several VMS statistical properties simultaneously
Such a cipher can be executed with 15th-century materials (playing cards, pen, paper)
The resulting ciphertexts remain decipherable and maintain plaintext letter sequence
Certain combinations of entropy, token length, and word grammar can coexist in cipher output

What the Paper Claims to Demonstrate (But Does Not):

That the Naibbe cipher "reliably replicates" VMS properties (it replicates properties it was fitted to)
That this provides evidence for the ciphertext hypothesis (circular reasoning)
That the cipher is historically plausible (significant unexplained complexity)
That it can explain line- and position-dependent effects (geometrically impossible)

CRITICAL METHODOLOGICAL FLAWS

1. Circular Reasoning in Validation (Fatal Flaw)

The Problem:

Cipher tables populated with actual VMS words and affixes (p. 6)
Table selection ratios (5:2:2:2:1:1) fitted to VMS word frequency distribution (p. 7, Supp. Mat. 3)
"Snake eyes" rule to achieve 47.2%-52.8% split between unigrams and bigrams (p. 4)
Paper then reports success at matching VMS word frequencies as validation (p. 17, Fig. 4)

Why This Is Invalid: This is equivalent to:

Measuring a target
Building a model to match that target
Testing the model against the same target
Declaring success

The paper uses the same data for both parameter fitting and validation, which violates basic scientific methodology. Figure 4's R² ≈ 0.9 correlation is presented as evidence of success, but it is merely confirmation that the implementation works as designed.

What Should Have Been Done:

Split-sample validation (fit to Currier A, test on Currier B)
Clear separation of fitted vs. emergent properties
Honest acknowledgment that matching fitted properties is expected, not surprising

2. Internal Design Contradictions

A. Randomness vs. Flexibility Paradox:

The paper simultaneously claims:

Playing cards provide random table selection (p. 7-8)
Scribes can vary table selection "nonrandomly" with personal "biases" (p. 27)

These are contradictory. If randomness is cryptographically important, scribes cannot vary it. If scribe variation is acceptable, randomness serves no purpose. The author never resolves which purpose the card-drawing mechanism actually serves.

Simpler Alternative: Deterministic table cycling (α, β₁, β₂, β₃, repeat) would also achieve the same purpose without requiring playing cards, would be easier to execute, and would be more historically plausible for a manuscript cipher where reproducibility matters.

B. Ambiguity Handling:

The paper proposes that when bigrams accidentally encrypt as unigram word types, scribes should "re-encrypt the bigram (by redrawing cards)" (p. 12-13).

This is unnecessarily complex. The paper already demonstrates mechanisms to distinguish prefix glyphs from suffix glyphs (Sec. 2.5). This should have been extended to make unigrams structurally distinct from bigrams, eliminating ambiguity by design rather than requiring trial-and-error during encryption and even more important during decryption.

Why This Matters: Would 15th-century cryptographers deliberately choose a design requiring probabilistic re-encryption rather than deterministic disambiguation?

INCOMPLETE COVERAGE OF VMS PROPERTIES

3. Word Type Coverage Inconsistency

The Numbers:

47.2% of tokens are unigrams (p. 4)
Unigram word types come directly from VMS
Yet only 29±1% of Naibbe word types exist in VMS (p. 13-14)
Only 45% of VMS word types can be generated (p. 22-23)

The Problem: If nearly half of all tokens are unigrams drawn from VMS word types, and there are only ~138 possible unigram word types, why is the type overlap so low? The paper never explains this apparent discrepancy.

Either:

The bigram word type generation is creating massive numbers of non-VMS types (undermining replication claims)
Or the reported statistics are inconsistent

This requires clarification and may indicate fundamental problems with the cipher's ability to replicate VMS word diversity.

A possible explanation might be that the cipher uses two distinct encryption modes:

Unigrams: Fixed word types (138 total, high frequency)
Bigrams: Combinatorial (prefix + suffix, thousands of combinations, lower frequency)

If the VMS were a Naibbe-style cipher this might allow an avenue to distinguish between unigrams and bigrams:

Build word frequency table → identify likely unigrams (high frequency)
Analyze structural patterns → confirm unigram/bigram classification
For bigrams, separate prefix from suffix using grammar rules
Analyze prefix and suffix frequencies independently

This creates detectable statistical signatures allowing classification of word types with probable 70-80% accuracy based on frequency and structure.

B. The Free Prefix-Suffix Combination Creates Predictability Mismatch:

VMS exhibits high within-word glyph predictability (Zattera 2022, Bowern & Lindemann 2021, You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view.), suggesting constraints on glyph co-occurrence.
The Naibbe cipher allows free combination of any prefix with any suffix (19,044 theoretical combinations), with no constraints on co-occurrence. This generates:

Much higher bigram word type diversity than VMS (Not all observed prefixes combine with all observed suffixes)
Lower within-word predictability than VMS
Structural pattern (compositional bigrams) not evident in VMS

To genuinely replicate VMS predictability requires:

Prefix-suffix compatibility constraints
Glyph co-occurrence rules
Reduction of combinatorial space

Such rules are not implemented, presumably because they would make the cipher even more complex and historically implausible.

4. Long-Range Correlations (Acknowledged Failure)

The paper acknowledges (p. 23-24, Fig. 8) that the cipher fails to reproduce VMS long-range correlations, which are well-documented key properties (Schinner 2007, Matlach et al. 2022, Montemurro & Zanette 2013).

Proposed Explanations:

"Nonrandom biases" during manual encryption
"Non-stochastic bursts of table use"
Quire-based construction effects

The Problem: These are post-hoc speculations never demonstrated to work. The fundamental issue is that random plaintext respacing and random table selection explicitly prevent long-range correlations by design. The proposed fixes contradict the core cipher structure.

Why This Is Critical: The paper frames this as a "limitation" (Sec. 4.2) suggesting it could be addressed. But this may be a falsification of this cipher type, not just a limitation of this implementation. Long-range correlations are not peripheral—they're fundamental VMS properties.

GEOMETRIC AND LOGICAL IMPOSSIBILITIES

5. Line-Level Structure Claims Are Mathematically Impossible

The Claim (p. 25-26): "If plaintext respacing were instead deterministic [...] then the line-level structure of the ciphertext would much more reliably vary with the plaintext's line-level structure"

The Geometric Problem:

Plaintext: ~57,000-76,000 letters (paper's calculation)
Ciphertext: ~38,000 tokens (known VMS length)
Encryption: 1.5-2 letters per token (paper's calculation)
Token size: 4-6 glyphs (VMS-like)

Mathematical Impossibility:

Line breaks in plaintext occur at specific letter positions
Line breaks in ciphertext occur at specific token positions
With fewer tokens than plaintext letters, and multi-glyph tokens, these cannot correspond

Concrete Example:
Plaintext (20 letters, 2 lines of 10):
Line 1: A R M A V I R U M C
Line 2: A N O T R O I A E P

After respacing: AR M AV I R UM QU E CA N O

After encryption (assume 6 tokens/line):
Line 1: <yteeor><qokar><chckh><otedy><okeedy><pdor>
(=AR, M, AV, I, R, UM - 12 plaintext letters)
Line 2: <ofeed><lched><qofch><daiin><y>
(=QU, E, CA, N, O - 8 plaintext letters)

No structural correspondence can exist between line boundaries.

6. Position-Dependent Effects: Superficial and Inaccurate Treatment

The Paper's Treatment (p. 26):

Mentions <p> "often begins" paragraphs → suggests decorative null
Mentions <m> "often appears" at line-end → suggests padding null

Actual VMS Behavior (Not Mentioned):

Paragraph-initial patterns:

Not random <p> and <f> insertions but specific word types (e.g., <pchol>, <olpchedy>, <ofaiin>, <ofaiiin>, <tsheoarom>, <pcheoldom>)
These words contain gallows (<p>, <f>) and rarely appear in non-initial lines (see You are not allowed to view links. Register or Login to view.)
This is a word-selection pattern, not glyph-level ornamentation for paragraph initial words

Line-final <m> patterns:

Only appears ~62% of time (see Timm 2014 p. 19 You are not allowed to view links. Register or Login to view.)
In Currier A: some pages have zero instances of <m>
In Currier A: <m> is used word final instead of line-final (see You are not allowed to view links. Register or Login to view.)
Varies significantly by section
<am> and <dam> are common legitimate word types, not limited to line-final positions (see You are not allowed to view links. Register or Login to view.)

Additional patterns completely ignored:

Line-first words are on average slightly longer than expected (see Elmar Vogt 2012: You are not allowed to view links. Register or Login to view.)
Line-second words are on average slightly shorter than expected
Line-first words prefer certain glyphs in initial position (see Emma May Smith 2015 You are not allowed to view links. Register or Login to view.)
Gallows have different distributions at different lines and line positions
Patterns vary between Currier A and B
etc.

Why This Is Critical?

These are word-level and word-length effects. The Naibbe cipher:

Operates at word-level (encrypts unigrams/bigrams)
Has no line-position awareness
Generates word lengths from plaintext identity and table selection
Cannot produce position-dependent word-length or word-selection effects

This is not a limitation that could be addressed—it's a fundamental incompatibility between the cipher type and observed VMS properties.

HISTORICAL PLAUSIBILITY ISSUES

7. Backwards-Engineered Design Rationale

Playing Cards Example (p. 7-8):

The paper argues:

Table ratios happen to sum to 13
Playing card decks are multiples of 13
Therefore cards are "especially convenient"

This is backwards reasoning. This only makes sense if you start from the ratio requirement (which was fitted to VMS). A historical cipher designer would:

Choose ratios based on cryptographic needs
Use the simplest available method
Not rely on the coincidence that needed ratios match card deck structure

Complexity Gap (p. 28-30):
The paper acknowledges "the Naibbe cipher would represent a major leap in complexity over known 15th-century ciphers" but argues this is acceptable.

The Gap:

Known 15th-century ciphers: 2-4 homophonic options per letter
Naibbe cipher: 18 options per letter (6 unigrams + 6 prefixes + 6 suffixes)
Plus: playing card apparatus, respacing procedure, ambiguity checking

The Proposed Evolution (p. 29-30): A 4-step speculative process for how the cipher could have evolved.

Problem: This is entirely speculative with no historical evidence. Each step requires insights not evident in the historical record. The paper presents this as plausibility argument but it's actually a just-so story.

8. The Practical Burden of the Cipher

Let me trace what a scribe must do to encrypt ONE plaintext letter:

Example: Encrypting the letter "A" in position "first in bigram"

Step 1: Determine if unigram or bigram

Roll die OR follow predetermined respacing scheme
Decision: bigram

Step 2: Determine position in bigram

Decision: first position

Step 3: Draw a card

Shuffle deck (or draw from remaining cards)
Draw: "7 of Cups"

Step 4: Consult card-to-table mapping

7 of Cups → Table β₁

Step 5: Consult Table β₁

Find row for letter "A"
Find column "Bigram start"
Result: <yt>

Step 6: Write <yt> in scratch encryption

Wait for second letter of bigram

Step 7: (After encrypting second letter)

Check if resulting word matches unigram word type
If yes: RESTART from Step 3 (redraw card, re-encrypt)
If no: Accept and continue

Keep in mind that these steps were necessary at least 57,000 times (~38,000 tokens ≈ 57,000-76,000 letters).

Question: Could you achieve comparable security with less complexity?

Example - Simplified Viable Alternative:
Materials: One table with 3 homophonic options per letter
Process:
1. Look up letter
2. Cycle through options (option 1, 2, 3, 1, 2, 3...)
3. Write symbol

The Naibbe cipher is much more complex for unclear security benefit.

The fundamental issue:

The Naibbe cipher could have been done in the 15th century in the narrow technical sense that the materials existed and the process is executable.
However, it is implausible that medieval cryptographers would have designed such a system because:

Inefficiency: the cipher is far too complex and therefore too slow to encrypt a whole book.
High error rate: Difficult to verify correctness
Training burden: it requires a lot of time to train someone to encrypt or to decrypt a message
Maintenance issues: Playing cards, extensive tables
No historical precedent: The complexity gap is enormous

In my eyes the cipher looks like it was designed by someone trying to match VMS statistics, not by someone trying to encrypt messages efficiently.

WHAT THE PAPER ACTUALLY CONTRIBUTES

Legitimate Contribution:

Demonstrating that a verbose homophonic substitution cipher can simultaneously satisfy multiple VMS constraints (entropy, token length, word grammar) is intellectually interesting. It shows that these properties are not individually impossible to replicate.

Not Demonstrated:

That this specific cipher (with VMS-derived parameters) is anything more than elaborate curve-fitting
That such a cipher is historically plausible
That it provides evidence for the ciphertext hypothesis
That it can explain key VMS properties (position effects, long-range correlations, word type diversity)

COMPARISON WITH STATED CRITERIA

The paper claims to meet Davis (2020a) criteria for acceptable VMS solutions. Assessment:

Met:

Uses known 15th-century cipher type (homophonic substitution)
Can be done with 15th-century materials
Derivation is reproducible
Maintains plaintext letter sequence

Not Met:

"Historically plausible" - questionable given complexity gap and unexplained design choices
"Reliably yield minimally ambiguous decryptions" - ambiguity handling is complex and unexplained
Replicates VMS properties - only replicates properties it was explicitly fitted to; fails on others

(27-11-2025, 01:24 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.My open-access paper describing the Naibbe cipher is now out in Cryptologia: You are not allowed to view links. Register or Login to view.

Thank you all very much for letting me present the cipher at VMD and for all your insightful feedback on the forum. It greatly improved the final paper.

I don't like your theory, Magnesium, but I sincerely admire your approach: you had an idea, you developed it, and you published an article. Voynich researchers who publish outside of forums and blogs are truly few and far between.

P.S. I'd like to understand, did you manage to decipher the text? The article is long, but I haven't been able to find the deciphered text. What's it about?

Hi Ruby Novacna,
The Naibbe cipher takes some Latin / Italian plaintext, that is, normal, readable, understandable text, and when put through the Naibbe cipher method, that normal plaintext is converted into an approximation of Voynichese.
There is nothing to decipher , unless you have some previously Naibbe enciphered text.

Hi Torsten,
are they your own thoughts ? you are supposed to acknowledge when you have you used AI you know Tongue

On the whole i agree with your post, though there are some nit-picking and hair-splitting arguments in there.

Hopefully, in the future, as magnesium notes, researchers can take the pros and cons of the Naibbe cipher and create a "Better…Stronger…Faster” version.

The Naibbe cipher has some issues, some acknowledged by magnesium and some pointed out by Torsten and
now that the Naibbe cipher is officially in the public domain it can function as a referenceable stepping stone for future research.

First off, I want to publicly thank Torsten for his thoughtful comments, both now and in earlier correspondence. Here is my reply, organized by the theme of Torsten's critique:

1. Scope of the paper and intended claims

Torsten tends to evaluate the Naibbe cipher as if I were proposing the exact historical cipher used to write the VMS or if I were proposing that my cipher fully explains all known VMS properties. However, the paper’s stated goal is narrower: to construct a fully specified, hand-executable substitution cipher, using materials available in early 15th-century Europe, that:

Preserves the full sequence of plaintext letters in order;
Produces decipherable ciphertext; and
Simultaneously reproduces many well-characterized statistical properties of Voynich B when encrypting a range of Latin and Italian texts.

The paper repeatedly disavows the idea that Naibbe is the exact VMS cipher and even takes pains to avoid any declaration that the VMS definitively is a ciphertext. The Naibbe cipher is best understood as a proof of concept. I devote an entire section of the paper, Section 4, to calling attention to several of the cipher’s existing failures, including its absence of long-range correlations, the lack of a mechanism for line-position effects, and incomplete word coverage.

Many of Timm’s most severe conclusions rest on reading the paper as an exacting historical reconstruction, rather than as what the Naibbe cipher is: a feasibility demonstration of what a letter-preserving substitution cipher can do. Under the more modest claim actually made in the paper, most criticisms are either already acknowledged as limitations or point the way toward further tests of this general cipher architecture.

2. “Circular reasoning” and curve-fitting

Torsten’s claim: The tables are populated with VMS words and affixes, table ratios are fitted to Voynich B’s frequency distribution, and the unigram-bigram split is tuned to match common word frequencies. Using these same data to “validate” the cipher is circular.

I agree that it’s important to distinguish between properties that are explicitly targeted and those that are emergent.

The design process of the Naibbe cipher began with an analysis of ensembles of hundreds of randomly generated substitution ciphers that map plaintext n-grams to strings of Voynichese glyphs formed by sweeping across the Zattera (2022) slot grammar. If one attempts to map plaintext letters to Voynichese glyph strings while treating the Voynichese “word grammar” (as defined by Zattera) as a binding empirical constraint, which kinds of glyph-letter mappings and distributions of plaintext n-gram lengths most reliably allow for the replication of the VMS’s observed entropy, token length distribution, and word type length distribution?

This analysis, described in the main text and Supplementary Material 2, establishes that within a verbose substitution scheme—an entropically essential feature of any VMS-mimic cipher encrypting on a letter-by-letter basis—simultaneous and extremely reliable replication of those VMS properties requires a plaintext consisting mostly of unigrams and bigrams. That’s not to say these properties are impossible to achieve while encrypting longer n-grams. But it’s much more probable to achieve these properties without any fine-tuning of the specific glyph-letter mapping if the unigram-bigram constraint is obeyed. In this setup, the Voynichese word grammar is not “fine-tuning”: It’s an important empirical constraint.

In addition, quasi-Zipfian distributions naturally arise when unigram-bigram plaintexts are encrypted using a homophonic substitution scheme on a letter-by-letter basis. The Naibbe cipher’s specific distribution is fitted to Voynich B; the cipher’s ability to exhibit a quasi-Zipfian distribution is emergent, arising from the general choice to apply a homophonic substitution cipher on a letter-by-letter basis to a unigram-bigram plaintext. The reason I proceeded with constructing the Naibbe cipher at all was because I was surprised to see a quasi-Zipfian distribution appear within Voynichesque ciphertexts, where practically all of the commonest word types encrypted unigrams.

As described in Supplementary Material 3, the Naibbe cipher is fine-tuned to the proportional frequency-rank distribution of Voynich B’s 70 commonest words, under the assumption that all of these words represent standalone alphabet letters under a random respacing scheme. The model used to fit this frequency-rank distribution assumes three inputs: plaintext alphabet letter frequencies; a globally constant number of substitution options per letter; and globally average proportions in which those substitution options are applied (i.e., the commonest option is always chosen X% of the time for every single alphabet letter).

These simplified modeling assumptions led to the six tables and the approximately 5:2:2:2:1:1 proportions in which they are applied on a letter-by-letter basis. This modeling also implied that within this scheme, unigrams cannot make up much more than 50% of the text; if unigrams were 100% of the text, the absolute frequency-rank distribution would overshoot Voynich B's observed one by approximately a factor of 2. If only 50% of the plaintext can be unigrams, Voynichesque (Supplementary Material 2) strongly implies that the other 50% would most likely be bigrams. One natural way to encode a bigram as a Voynichese word type is to split the Voynichese word type down the middle and develop grammatically valid “prefix” and “suffix” inventories, which aligns with the strict glyph sequencing rules observed in Voynichese.

Thus, I fully agree that:

The table number, probabilities, and assignment of specific Voynichese strings to table slots are fitted to Voynich B, as described in detail in the paper and its supplementary materials; and
It is unsurprising that this leads to reasonable agreement on some directly targeted metrics, such as the most common word types.

That said, the paper’s central contribution is not that the Naibbe cipher magically re-discovers that which was explicitly fitted. Rather, it shows that given these fitted components and the Voynichese word grammar, a large bundle of other features—e.g., character entropy, conditional character entropy, glyph and glyph-pair frequencies, token and type length distributions—reliably fall into place as emergent consequences, and that this behavior is stable across multiple, stylistically distinct Latin and Italian plaintexts, tens of thousands of ciphertext tokens at a time. In addition, multiple unmodeled properties of the VMS also emerge, such as the presence of skewed pairs in the ciphertext.

I do not accept the conclusion that the presence of any fitted components renders the exercise “mere curve-fitting” or methodologically invalid. In the context of a singular artifact like the VMS, it is reasonable to ask: Is there any plausible letter-preserving substitution scheme that can reliably reach this statistical regime at all? Naibbe answers that question in the affirmative.

3. Randomness versus scribal “bias”: alleged contradiction

Torsten’s claim: The paper both emphasizes random table selection (via cards) and later speculates that non-random scribal biases and table “bursts” could explain long-range correlations, which Torsten considers to be contradictory.

The paper uses dice and playing cards in two roles:

As a clean, reproducible baseline for respacing and table selection in modern experiments; and
As one historically plausible implementation of the required table probability distributions.

Section 2.3 explicitly states that “any random or even non-random source” that yields the same approximate ratios on average would suffice, suggesting letter-based rules as an alternative to playing cards. Cards are historically attested, and I personally found them to be experimentally convenient, but they are not a doctrinal requirement of the cipher. Section 4, in turn, shows that the pseudorandom baseline provided by the card mechanism fails to reproduce long-range correlations. As a result, the paper then proposes non-random deviations (scribal habits, line-by-line reuse, bursts of table use) as candidate mechanisms to add on top of the baseline.

I see no logical contradiction here. The Naibbe cipher is essentially a modeling exercise with a simple pseudorandom core, one provided in this instantiation by drawn playing cards. The cipher’s mismatches with the VMS suggest the need for alternative and/or additional mechanisms. I agree with Torsten that extra mechanisms would need to be implemented to make a more conclusive claim that this class of cipher can generate the VMS, which is why I state in the paper that:

“...[I]n its current form, the Naibbe cipher fails in several major ways to replicate key properties of the VMS.”
“The Naibbe cipher cannot be exactly how the VMS was created.”
“I do not assert that the Naibbe cipher precisely reflects how the VMS was created, nor do I assert that the VMS even is a ciphertext.”
“...[T]he Naibbe cipher’s incomplete replication of Voynich B’s properties underscores the difficulty of achieving a comprehensive cipher-based model for VMS text generation.”

4. Ambiguity and bigram/unigram collisions

Torsten’s claim: Allowing bigram tokens to coincide with unigram tokens and then managing this ambiguity via collision avoidance and re-encryption is unnecessarily complex and historically implausible.

I agree that the ambiguity-management machinery is not especially elegant. However, ambiguity management is a design compromise, not an oversight, and the paper proposes re-encryption as a way to keep decryptions reliably recoverable. The goal here was not to identify the one true VMS solution; it was to build a fully functional cipher that statistically mimics the VMS with high reliability while encrypting a meaningful plaintext. After experimentally testing the cipher, I found collision avoidance to be a practical solution.

The procedure is invoked only in the specific case where a bigram token accidentally lands on a unigram word type. The paper’s decoding strategy—treat tokens as unigrams if they match, otherwise parse as bigrams—takes advantage of the fact that such collisions are relatively rare, especially if an experienced scribe mitigates collisions during encryption.

The most relevant question here is not whether the design is mathematically pristine, but whether a reasonably trained scribe can reliably decrypt Naibbe ciphertext with the aid of tables. The worked example in Section 2.6, the decoding tables in Tables 7–9, and the decryption exercises posed by Figure 5 and the final line of the paper demonstrate that they can. And for what it’s worth, I am completely open to variants of the cipher that enforce stronger structural distinctions between unigram and bigram tokens, thereby reducing the need for re-encryption. I see this as an area for experimentation, not as a fatal flaw.

5. Word-type coverage and combinatorial morphology

Torsten’s claim: The Naibbe cipher produces only ~45% of Voynich B word types and, with effectively free prefix–suffix combination, generates many non-VMS types. This suggests a mismatch between Naibbe’s combinatorial freedom and the VMS’s constrained morphology.

I agree that this is a genuine limitation of the Naibbe cipher, and I already say as much in the paper. Section 4 notes that limiting valid plaintext n-grams to only unigrams and bigrams necessarily bounds the generable word-type space, especially for producing especially rare long word types. Relaxing this cap could expand coverage but would also increase ambiguity. This is already framed as a tunable tradeoff.

On the “free combination” point, it is important to emphasize that Naibbe does not treat prefixes and suffixes as fully unconstrained:

All affixes must be valid under a Zattera-style slot grammar;
Certain glyphs are confined to either type-1 affixes or type-2 affixes, which correspond roughly with Zattera’s slots 0-5 or slots 6-11;
Nearly all prefixes are type-1 affixes, and nearly all suffixes are type-2 affixes, which meaningfully reduces the effective morphological design space; and
Table selection probabilities are strongly skewed, further concentrating realized combinations.

I freely concede that the combinatorial space is not perfectly aligned with VMS’s observed word types and agree that this deserves more quantitative treatment, as I say in the paper. I deliberately did not go too far down the path of quantitatively optimizing the grammar or the specific affix-letter assignments, as doing so felt far too uncomfortably like a decryption attempt that assumed—almost certainly incorrectly—that the VMS was not only a ciphertext and but also that I had found the VMS’s exact cipher architecture.

Again, Naibbe is not positioned as a generative model for all VMS properties. We must remember that the Naibbe cipher is focused on a more modest claim: demonstrating the existence of a class of substitution cipher whose ciphertexts can live within the statistical neighborhood of Voynich B when encrypting real plaintexts.

6. Long-range correlations

Torsten's claim: The Naibbe cipher fails to reproduce well-documented long-range correlations in the VMS; this may falsify the entire Naibbe class of ciphers, not just the current implementation.

I fully agree that the current Naibbe implementation does not reproduce long-range correlations. I discuss this failure at length in Section 4, reporting it explicitly using random-walk analysis and highlighting it as a major limitation. Where I disagree is the leap from:

“This particular version of the Naibbe cipher, which uses random respacing and independent table draws, fails when encrypting a compound plaintext made of diverse Latin and Italian sources”

to

“This entire class of verbose homophonic substitution ciphers is incompatible with long-range correlations across all possible plaintexts.”

Torsten does not prove general incompatibility. Section 4 of the paper sketches out several candidate mechanisms that could potentially introduce correlations that operate on top of the cipher’s existing structure. I see implementation and testing of such mechanisms to be fruitful avenues of future research. Generally, I think it would be fascinating to see whether and how a Naibbe-like cipher could be hybridized with the rules explored in the self-citation algorithm, such as line-by-line reuse. One important thing to look at in the testing of such a hybrid cipher would be the nature of the modeled plaintext. Is there any configuration that can reliably accommodate a plaintext written in prose, or does the only probabilistically favored type of plaintext read as gibberish? Let’s go out and test it!

7. Line-level structure and position-dependent effects

Torsten’s claim: Because Naibbe is position-agnostic at the line level, it cannot reproduce observed line-initial and line-final effects, gallows distributions, and related positional phenomena, which are “fundamental VMS properties.” Torsten also characterizes the paper’s gesture at line-initial <p> glyphs and line-terminal <m> glyphs as “superficial and inaccurate.”

The paper calls attention to the cipher's inability to replicate positional effects, and my suggested modifications are deliberately modest. I did not and do not claim to reproduce the full suite of positional phenomena documented for the VMS. Fully enumerating and grappling with these properties was beyond the scope of an already dense first paper. In addition, nothing in Naibbe’s design prevents positional behavior being layered on, though as I note in the paper, it would be inelegant to bolt many additional rules onto the current Naibbe cipher to force this outcome.

If the VMS does represent a ciphertext, the encoding method must naturally produce line-position effects, a property that the Naibbe cipher currently lacks. So I agree with Torsten in spirit that positional effects are important to study further. I disagree that these effects’ current absence from the Naibbe cipher proves broad structural incompatibility with a Naibbe-class verbose homophonic cipher.

8. Historical plausibility and practical burden

Torsten’s claim: Naibbe is far more complex than known 15th-century ciphers, imposes a heavy operational burden (especially if used for an entire book), and is therefore historically implausible. In addition, the paper’s stepwise evolution sketch is “just-so.”

I agree that Naibbe sits at the high end of plausible complexity; the paper says as much. Section 4 explicitly notes that Naibbe “would represent a major leap in complexity over known 15th-century ciphers.”

However, several points mitigate the charge of implausibility:

Survival bias in the cipher record. Our surviving corpus of 15th-century ciphers is small and heavily weighted toward diplomatic and mercantile use and thus cannot represent the long tail of all ciphers designed or used within medieval Europe. By all appearances, the VMS is a one-off, as is its exact method of text generation, regardless of whether it’s meaningful or meaningless.

Matching effort to artifact. The VMS itself clearly represents a substantial investment of time and resources. It is reasonable to explore the upper edge of what a determined person—or group of people—could have done over a span of months to years. What’s more, the more than century of failed VMS decipherments strongly suggest that if the VMS contains meaning in the form of an encoded message, the cipher must deviate from mainstream cryptographic history in a serious and fairly elaborate way. Otherwise, the VMS would have been cracked by now.

Collective effort. I know that Torsten prefers the solo-author hypothesis, but his “encrypting 57,000 letters alone” scenario does not jibe with the recent paleographic suggestion of multiple scribes. Naibbe-style encryption could have been distributed across several people. And having written in the cipher myself, my personal suspicion is that a single experienced scribe could pull off a VMS-length ciphertext in several months of full-time work, on the order of but probably longer than Gordon Rugg’s (2004) suggested timeline with Cardan grilles. In both cases, using external physical objects to facilitate affix selection—grilles and tables in Rugg’s case, tables and playing cards in my own—makes the procedure much less of a cognitive burden than one might initially suspect.

I agree that the stepwise evolutionary sketch in Section 4 is speculative; it is labeled as such and is not offered as direct historical evidence. I do not claim to have documentary proof of Naibbe-like systems existing in the 15th century. Having said that, I find it at least interesting that my modeling efforts preferred a substitution scheme acting on a unigram-bigram plaintext, and advanced ciphers of the approximate time and region of the VMS’s creation are attested to substituting the same plaintext units, albeit with far less systemization.

From the standpoint of the paper’s core question—“are substitution ciphers in principle capable of producing VMS-like text under 15th-century constraints?”—it is appropriate, in my view, to push into ambitious territory. If an elaborate system straining the upper limits of historical plausibility could not even partially match the VMS, that would be strong evidence against the ciphertext hypothesis. The fact that it can is therefore informative.

9. Paper's own criteria and actual contribution

Torsten concludes by evaluating the Naibbe cipher against the criteria mentioned in the paper, based on my own criteria and those of Lisa Davis. I believe Naibbe meets more of these criteria than he credits:

Uses a known 15th-century cipher type (homophonic substitution): Yes.
Can be done with 15th-century materials (tables, cards, dice, wax tablets): Yes.
Derivation is reproducible and fully specified: Yes.
Preserves the plaintext letter sequence in order: Yes (a strict requirement I built in by design).
Reliably yields minimally ambiguous decryptions: I would say yes, with caveats and explicit procedures; Torsten regards this as more problematic. (There are two decryption exercises in the paper itself, for those who want to try it for themselves.)
Historically plausible: Unquestionably on the high end of complexity but arguable.
Replicates VMS properties: Many, but not all, with clear acknowledgment of several key missing ones (notably long-range correlations and positional effects).

It’s also worth noting that Torsten summarizes the paper’s main takeaway accurately and fairly:

“Demonstrating that a verbose homophonic substitution cipher can simultaneously satisfy multiple VMS constraints (entropy, token length, word grammar) is intellectually interesting. It shows that these properties are not individually impossible to replicate.”

That is precisely the level of contribution I intended Naibbe to make: not a definitive solution to the VMS, but an existence proof that a letter-preserving, hand-doable substitution cipher can live deep inside the VMS’s statistical envelope.

In summary, I agree with Torsten where he identifies genuine limitations of Naibbe 1.0. I disagree with his conclusions that frame these open issues as proof of the VMS’s fundamental incompatibility with Naibbe-class ciphers. If anything, going out and testing these open issues could place very tight constraints on how a substitution cipher specifically could or could not generate fully VMS-like text. I also disagree with readings of the paper that treat Naibbe as anything more than a demonstration of feasibility.

(08-12-2025, 12:04 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Hi Ruby Novacna,
The Naibbe cipher takes some Latin / Italian plaintext, that is, normal, readable, understandable text, and when put through the Naibbe cipher method, that normal plaintext is converted into an approximation of Voynichese.
There is nothing to decipher , unless you have some previously Naibbe enciphered text.

Phew, I'm relieved!
So we could calmly get on with our translations/deciphering.

Dear Michael,

Thank you for your response. While I appreciate your engagement, I don't find that your response successfully addresses the fundamental issues raised in my review.

Specifically:

1. Scope of the paper and intended claims

I appreciate your clarification that the Naibbe cipher is intended as a "proof of concept" rather than a definitive historical solution.

However, you state that the paper's goal is to demonstrate that a substitution cipher "can live deep inside the VMS's statistical envelope." This formulation is fundamentally circular when the cipher is explicitly constructed to match VMS statistics. A proof of concept must demonstrate something that was previously uncertain or unknown. Here, the outcome is predetermined by design. This reasoning structure conflates construction with discovery. A genuine proof of concept would demonstrate that when one applies general cryptographic principles (not fitted to the VMS), VMS-like statistics emerge unexpectedly. Instead, the Naibbe cipher demonstrates only that one can deliberately engineer a system to produce predetermined outputs.

2. Fitted vs. Emergent Properties

Your defense hinges on distinguishing "explicitly targeted" properties from "emergent" ones. This distinction deserves careful scrutiny.

You list several features as emergent: Character entropy, Conditional character entropy, Glyph and glyph-pair frequencies, Token and type length distributions, Skewed pairs

Character entropy, this is not truly emergent when:

Glyph-to-letter mappings are selected from VMS vocabulary
Table structures follow VMS word grammar (Zattera 2022)

Quasi-Zipfian distribution: You state this "naturally arises" from the unigram-bigram structure. However:

The unigram-bigram ratio (47.2%-52.8%) was explicitly tuned to match VMS
The table ratio (5:2:2:2:1:1) was fitted to VMS frequency distribution
The claim that quasi-Zipfian distributions are "emergent" from letter-by-letter homophonic substitution requires demonstration with unfitted parameters

Token and type length distributions: These follow directly from:

The Zattera slot grammar (taken from VMS)
The unigram/bigram split (fitted to VMS)
The affix inventory lengths (derived from VMS)

A truly emergent property would be one that appears despite not being directly targeted. For example, if the cipher unexpectedly reproduced VMS long-range correlations without being designed to do so, that would constitute genuine emergence.

3. Randomness, Determinism, and Mechanism Design

You state there is "no logical contradiction" between using cards for pseudorandomness and proposing non-random scribal biases to explain long-range correlations. I disagree. This reveals a fundamental ambiguity about the cipher's purpose.

If cryptographic security is the goal, then:

Randomness is essential (it prevents pattern-based attacks)
Scribal "biases" and "bursts" are security vulnerabilities
Non-random deviations should be prevented, not encouraged

If VMS replication is the goal, then:

The specific randomization mechanism is irrelevant
Any table-selection method achieving the target ratios suffices
Cards are not "historically convenient" but rather unnecessarily complex

You cannot have it both ways. Either this is a serious historical cipher proposal (in which case the security properties matter), or it is a statistical replication exercise (in which case the cards are theater).

I previously suggested that deterministic table cycling (α, β₁, β₂, β₃, repeat) would achieve better results without requiring any physical apparatus. You do not address this point. If the goal is historical plausibility, why choose the more complex option?

The most parsimonious explanation: cards were chosen because they provide a veneer of historical authenticity to what is fundamentally a modern statistical modeling exercise.

6. Long-range correlations

You acknowledge the cipher's failure to reproduce long-range correlations and frames this as a "limitation" that could potentially be addressed with "non-random deviations." This significantly understates the problem.

Long-range correlations represent a significant feature of the VMS script. The Naibbe cipher faces two fundamental problems in accounting for them:

First, the structural impossibility: N-gram substitution encryption operates locally on plaintext, processing one letter or letter-pair at a time. The cipher's random table selection and random plaintext respacing explicitly destroy any sequential dependencies present in the source text. No plausible mechanism exists for long-range correlations to emerge as an epiphenomenon from these local, randomized operations.

Second, the detectable primary table dominance: Even without long-range correlations, the Naibbe cipher creates its own problematic statistical signature. The cipher maps Latin letter frequencies to VMS token frequencies through six tables weighted 5:2:2:2:1:1. This weighting means the primary table (α) is used 38.5% of the time, while the five secondary tables are each used far less frequently (15.4%, 15.4%, 15.4%, 7.7%, 7.7%). This creates strong, detectable correlations between plaintext letters and specific ciphertext tokens—precisely what a robust cipher should prevent: In an longer ciphertext, the most frequent tokens would always represent letters encrypted by the primary table, preserving their rank-order frequency from the plaintext.

The mathematical basis: Latin 'e' appears in ~12-13% of letters. When encrypted via the primary table α (38.5% of the time), this produces α's 'e' token at a frequency of ~4.8% of all tokens. The next most frequent Latin letter 't' (~9%) produces α's 't' token at only ~3.5% frequency. Secondary table 'e' tokens each appear at ~1.9% frequency. Therefore, α's 'e' token is very likely to be the single most frequent token type in typical ciphertexts.

The cryptographic implication: This violates a fundamental principle of homophonic substitution ciphers: homophones should distribute frequencies evenly to obscure plaintext letter frequencies. The Naibbe cipher's weighted tables achieve the opposite—they concentrate frequency, making the most common tokens strong indicators of plaintext identity.

You propose speculative mechanisms (non-random biases, quire-level effects) that might induce long-range correlations, but these remain untested and potentially contradictory to the cipher's design. More critically, addressing the correlation deficit would not resolve the primary table dominance problem—indeed, these two issues may be fundamentally in tension with each other.

7. Position-Dependent Effects

You state that "nothing in Naibbe's design prevents positional behavior being layered on" but acknowledges it would be "inelegant." This significantly understates the challenge.

The Core Problem is that the Naibbe cipher operates at the word level:

Input: plaintext letters → Output: complete Voynichese words
No awareness of line position during encryption
Word lengths determined by plaintext identity and table selection

VMS position effects include:

Paragraph-initial words from specific vocabulary subset
Line-first words are slightly longer on average
Line-second words are slightly shorter
Gallows distribution varies by line position
Line-final <m> appears ~62% of time (with variation by section)
Different patterns in Currier A vs. B

To retrofit position awareness would require:

During encryption: Scribe must know future line breaks before determining word selection
Line-break awareness: Scribe must track token count and estimate when line break will occur
Position-conditional word selection: Different word types for different positions
Interaction with respacing: Random respacing must somehow be overridden or constrained by position requirements

This is not "layering on" a feature—it's redesigning the entire cipher architecture to be position-aware from the ground up.

The Naibbe cipher's architecture is specifically incompatible with position effects because it makes word-level decisions independently of positional context.

8. Historical Plausibility: The Complexity Gap Remains

You acknowledge Naibbe is at "the high end of plausible complexity" but argues this is acceptable given:

Survival bias in cipher record
VMS represents substantial investment
Multiple scribes could distribute effort
Elaborate cipher needed to resist century of decryption attempts

I find each of these arguments unpersuasive.

Survival Bias:
You argument is that our 15th-century cipher corpus is small and biased. However:

We have hundreds of diplomatic ciphers from this period
We have treatises describing cipher systems
We have cipher nomenclators from multiple European courts
The progression from simple substitution → homophonic substitution → nomenclators is well-documented

The gap between attested systems (2-4 homophones per letter) and Naibbe (18 options per letter + apparatus) is not explained by sampling bias—it's an order-of-magnitude difference in complexity.

You are arguing that the VMS's substantial investment justifies exploring complex ciphers. This reverses the causal logic.

The "Resistance to Decryption" Argument

You claim that failed decryption attempts suggest the cipher must be elaborate. This is backwards reasoning:

Failed decryption could indicate the text is not enciphered
Steganography, glossolalia, or constructed languages might resist decryption without requiring complex cryptographic design

Collective Effort and Timeline

You suggest multiple scribes could have shared encryption duties. However:

Training burden: Each scribe needs to learn the complex system
Consistency requirement: All scribes must follow identical table-selection ratios
Error propagation: Any mistake affects plaintext recoverability
Coordination complexity: How do scribes maintain consistent table usage?

The paleographic evidence for multiple hands might actually argue against cipher complexity, as it increases the training and coordination burden.

The Stepwise Evolution Speculation

You acknowledge the evolutionary sketch (p. 29-30) is speculative but defends it as "at least interesting" that modeling preferred unigram-bigram structure. This defense is insufficient:
The fact that modern optimization within VMS-constrained parameter space yields unigram-bigram structure tells us nothing about historical development. This is, again, finding that VMS-fitted parameters produce VMS-like results.

9. The Proof-of-Concept Claim Revisited

You state: "From the standpoint of the paper's core question—'are substitution ciphers in principle capable of producing VMS-like text under 15th-century constraints?'—it is appropriate, in my view, to push into ambitious territory."

I agree this is a legitimate question. However, the paper does not successfully answer it because:

What Was Actually Demonstrated?

A cipher explicitly fitted to VMS can produce some VMS properties (those it was fitted to)
Materials existed in the 15th century (not disputed)
The process is executable (not disputed)

What Was Not Demonstrated

That such a cipher would arise from cryptographic first principles
That such a cipher is historically plausible given the complexity gap
That the cipher can produce properties it was not fitted to (it fails on long-range correlations and position effects)
That the emergent properties are truly independent of the fitted ones

The Epistemological Problem: A valid proof of concept would demonstrate that plausible historical constraints lead to VMS-like output as an unintended consequence. Instead, the Naibbe cipher demonstrates that modern reverse-engineering can produce predetermined results.

The core disagreement can be summarized:

Your position: "I have demonstrated that it is possible in principle for a substitution cipher to produce VMS-like statistics using 15th-century materials."
My position: "You have demonstrated that by deliberately fitting parameters to VMS, you can build a system that reproduces some (but not all) VMS properties. This does not constitute evidence for the ciphertext hypothesis because the success was predetermined by design."

Prior to Naibbe: It was already known that:

Homophonic substitution existed in the 15th century
Verbose ciphers are theoretically possible
Deliberate system design can produce complex outputs

After Naibbe: We now additionally know that:

A cipher explicitly reverse-engineered from VMS can match some VMS properties
Such a cipher fails on other properties (long-range correlations, position effects)
The required complexity significantly exceeds historical attestation

The question is: does this increase or decrease our confidence that the VMS is enciphered text?

I argue it provides evidence against the ciphertext hypothesis because:

The fitted parameters make success unsurprising (no predictive power)
The failures on unfitted properties suggest structural mismatch
The complexity gap raises serious plausibility concerns
Alternative explanations (constructed language, glossolalia, text generated by self-citation) remain equally viable and more parsimonious

The Naibbe cipher is an impressive technical achievement demonstrating considerable ingenuity and effort. However, it does not successfully answer the question it poses because the methodology conflates engineering demonstration with scientific evidence. A true proof of concept would require that VMS-like properties emerge from historically plausible cryptographic principles applied without specific fitting to VMS statistics.

I maintain that the most honest characterization of this work is: "A demonstration that one can deliberately engineer a complex system to match predetermined statistical targets, which tells us more about the flexibility of cipher design than about the VMS itself."

The path forward requires either:

Demonstrating that the fitted components are the unique or strongly preferred solution from historical-cryptographic principles (not modern optimization)
Providing independent validation that the cipher works on VMS data not used during parameter selection
Developing concrete mechanisms for the missing properties (correlations, position effects) rather than speculating they could be added later

Without these elements, the Naibbe cipher remains an interesting intellectual exercise but does not constitute meaningful evidence for or against the ciphertext hypothesis.

Best regards,
Torsten

The clear next step, as I see it, is to learn from the current version of the Naibbe cipher and attempt to develop methods that address what it cannot currently replicate, with an initial focus on long-range correlations. I have said from the very beginning that the Naibbe cipher is a place to start. More generally, it seems useful to have a VMS benchmark that preserves meaning, to be tested alongside the self-citation algorithm and Markov chain models of the VMS text across a wider range of statistical tests.

Even as we disagree, this is exactly the kind of conversation I wanted to spark: We are talking about the concrete properties of a concretely defined and falsifiable class of substitution cipher. As I say in the paper, all models are wrong, but some are useful. The Naibbe cipher is almost certainly wrong in important ways. That doesn't make it useless as a point of reference.

Good to see people that you can disagree in a very polite non-offensive way! Smile

I am excited and pleased to discover this Naibbe ciopher you've constructed. It is clearly the result of careful study and worthy of attention. I am now thinking I should develop some kind of artwork to celebrate the Naibbe, perhaps some poetry enciphered, or an embroidery, or, well, something. But beyond that, the careful responses you provide here to the questions and provocations, even to the extent of including some m-dashes in your reply up there, well, that takes the cake. A demonstration of superior patience and gentle and very modern ape-ness. Good for you.

Live Science has an article on the 'Naibbe cipher'
"Mysterious Voynich manuscript may be a cipher, a new study suggests"

It basically says Medieval hoax theories are gaining ground but this cipher method demonstrates a way plaintext could be plausibly encoded.
You are not allowed to view links. Register or Login to view.

Pages: 1 2

Torsten

Ruby Novacna

RobGea

magnesium

Ruby Novacna

Torsten

magnesium

Rafal

Krasturak

RobGea