07-12-2025, 02:02 AM
(27-11-2025, 01:24 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.My open-access paper describing the Naibbe cipher is now out in Cryptologia: You are not allowed to view links. Register or Login to view.
Thank you all very much for letting me present the cipher at VMD and for all your insightful feedback on the forum. It greatly improved the final paper.
Hello Michael, here are my thoughts about your interesting paper.
OVERVIEW
This paper attempts to investigate whether the Voynich Manuscript (VMS) could be a ciphertext by developing what the author calls the "Naibbe cipher"—a verbose homophonic substitution cipher designed to encrypt Latin and Italian text while reproducing VMS statistical properties. While the work demonstrates creativity and shows that certain VMS properties can be replicated through deliberate cipher design, it suffers from fundamental methodological flaws, circular reasoning, internal contradictions, and incomplete engagement with VMS properties.
PRINCIPAL FINDINGS
What the Paper Demonstrates:
- A verbose homophonic substitution cipher can be designed to match several VMS statistical properties simultaneously
- Such a cipher can be executed with 15th-century materials (playing cards, pen, paper)
- The resulting ciphertexts remain decipherable and maintain plaintext letter sequence
- Certain combinations of entropy, token length, and word grammar can coexist in cipher output
What the Paper Claims to Demonstrate (But Does Not):
- That the Naibbe cipher "reliably replicates" VMS properties (it replicates properties it was fitted to)
- That this provides evidence for the ciphertext hypothesis (circular reasoning)
- That the cipher is historically plausible (significant unexplained complexity)
- That it can explain line- and position-dependent effects (geometrically impossible)
CRITICAL METHODOLOGICAL FLAWS
1. Circular Reasoning in Validation (Fatal Flaw)
The Problem:
- Cipher tables populated with actual VMS words and affixes (p. 6)
- Table selection ratios (5:2:2:2:1:1) fitted to VMS word frequency distribution (p. 7, Supp. Mat. 3)
- "Snake eyes" rule to achieve 47.2%-52.8% split between unigrams and bigrams (p. 4)
- Paper then reports success at matching VMS word frequencies as validation (p. 17, Fig. 4)
Why This Is Invalid: This is equivalent to:
- Measuring a target
- Building a model to match that target
- Testing the model against the same target
- Declaring success
The paper uses the same data for both parameter fitting and validation, which violates basic scientific methodology. Figure 4's R² ≈ 0.9 correlation is presented as evidence of success, but it is merely confirmation that the implementation works as designed.
What Should Have Been Done:
- Split-sample validation (fit to Currier A, test on Currier B)
- Clear separation of fitted vs. emergent properties
- Honest acknowledgment that matching fitted properties is expected, not surprising
2. Internal Design Contradictions
A. Randomness vs. Flexibility Paradox:
The paper simultaneously claims:
- Playing cards provide random table selection (p. 7-8)
- Scribes can vary table selection "nonrandomly" with personal "biases" (p. 27)
These are contradictory. If randomness is cryptographically important, scribes cannot vary it. If scribe variation is acceptable, randomness serves no purpose. The author never resolves which purpose the card-drawing mechanism actually serves.
Simpler Alternative: Deterministic table cycling (α, β₁, β₂, β₃, repeat) would also achieve the same purpose without requiring playing cards, would be easier to execute, and would be more historically plausible for a manuscript cipher where reproducibility matters.
B. Ambiguity Handling:
The paper proposes that when bigrams accidentally encrypt as unigram word types, scribes should "re-encrypt the bigram (by redrawing cards)" (p. 12-13).
This is unnecessarily complex. The paper already demonstrates mechanisms to distinguish prefix glyphs from suffix glyphs (Sec. 2.5). This should have been extended to make unigrams structurally distinct from bigrams, eliminating ambiguity by design rather than requiring trial-and-error during encryption and even more important during decryption.
Why This Matters: Would 15th-century cryptographers deliberately choose a design requiring probabilistic re-encryption rather than deterministic disambiguation?
INCOMPLETE COVERAGE OF VMS PROPERTIES
3. Word Type Coverage Inconsistency
The Numbers:
- 47.2% of tokens are unigrams (p. 4)
- Unigram word types come directly from VMS
- Yet only 29±1% of Naibbe word types exist in VMS (p. 13-14)
- Only 45% of VMS word types can be generated (p. 22-23)
The Problem: If nearly half of all tokens are unigrams drawn from VMS word types, and there are only ~138 possible unigram word types, why is the type overlap so low? The paper never explains this apparent discrepancy.
Either:
- The bigram word type generation is creating massive numbers of non-VMS types (undermining replication claims)
- Or the reported statistics are inconsistent
This requires clarification and may indicate fundamental problems with the cipher's ability to replicate VMS word diversity.
A possible explanation might be that the cipher uses two distinct encryption modes:
- Unigrams: Fixed word types (138 total, high frequency)
- Bigrams: Combinatorial (prefix + suffix, thousands of combinations, lower frequency)
If the VMS were a Naibbe-style cipher this might allow an avenue to distinguish between unigrams and bigrams:
- Build word frequency table → identify likely unigrams (high frequency)
- Analyze structural patterns → confirm unigram/bigram classification
- For bigrams, separate prefix from suffix using grammar rules
- Analyze prefix and suffix frequencies independently
This creates detectable statistical signatures allowing classification of word types with probable 70-80% accuracy based on frequency and structure.
B. The Free Prefix-Suffix Combination Creates Predictability Mismatch:
VMS exhibits high within-word glyph predictability (Zattera 2022, Bowern & Lindemann 2021, You are not allowed to view links. Register or Login to view., You are not allowed to view links. Register or Login to view.), suggesting constraints on glyph co-occurrence.
The Naibbe cipher allows free combination of any prefix with any suffix (19,044 theoretical combinations), with no constraints on co-occurrence. This generates:
- Much higher bigram word type diversity than VMS (Not all observed prefixes combine with all observed suffixes)
- Lower within-word predictability than VMS
- Structural pattern (compositional bigrams) not evident in VMS
To genuinely replicate VMS predictability requires:
- Prefix-suffix compatibility constraints
- Glyph co-occurrence rules
- Reduction of combinatorial space
Such rules are not implemented, presumably because they would make the cipher even more complex and historically implausible.
4. Long-Range Correlations (Acknowledged Failure)
The paper acknowledges (p. 23-24, Fig. 8) that the cipher fails to reproduce VMS long-range correlations, which are well-documented key properties (Schinner 2007, Matlach et al. 2022, Montemurro & Zanette 2013).
Proposed Explanations:
- "Nonrandom biases" during manual encryption
- "Non-stochastic bursts of table use"
- Quire-based construction effects
Why This Is Critical: The paper frames this as a "limitation" (Sec. 4.2) suggesting it could be addressed. But this may be a falsification of this cipher type, not just a limitation of this implementation. Long-range correlations are not peripheral—they're fundamental VMS properties.
GEOMETRIC AND LOGICAL IMPOSSIBILITIES
5. Line-Level Structure Claims Are Mathematically Impossible
The Claim (p. 25-26): "If plaintext respacing were instead deterministic [...] then the line-level structure of the ciphertext would much more reliably vary with the plaintext's line-level structure"
The Geometric Problem:
- Plaintext: ~57,000-76,000 letters (paper's calculation)
- Ciphertext: ~38,000 tokens (known VMS length)
- Encryption: 1.5-2 letters per token (paper's calculation)
- Token size: 4-6 glyphs (VMS-like)
Mathematical Impossibility:
- Line breaks in plaintext occur at specific letter positions
- Line breaks in ciphertext occur at specific token positions
- With fewer tokens than plaintext letters, and multi-glyph tokens, these cannot correspond
Concrete Example:
Plaintext (20 letters, 2 lines of 10):
Line 1: A R M A V I R U M C
Line 2: A N O T R O I A E P
After respacing: AR M AV I R UM QU E CA N O
After encryption (assume 6 tokens/line):
Line 1: <yteeor><qokar><chckh><otedy><okeedy><pdor>
(=AR, M, AV, I, R, UM - 12 plaintext letters)
Line 2: <ofeed><lched><qofch><daiin><y>
(=QU, E, CA, N, O - 8 plaintext letters)
No structural correspondence can exist between line boundaries.
6. Position-Dependent Effects: Superficial and Inaccurate Treatment
The Paper's Treatment (p. 26):
- Mentions <p> "often begins" paragraphs → suggests decorative null
- Mentions <m> "often appears" at line-end → suggests padding null
Actual VMS Behavior (Not Mentioned):
Paragraph-initial patterns:
- Not random <p> and <f> insertions but specific word types (e.g., <pchol>, <olpchedy>, <ofaiin>, <ofaiiin>, <tsheoarom>, <pcheoldom>)
- These words contain gallows (<p>, <f>) and rarely appear in non-initial lines (see You are not allowed to view links. Register or Login to view.)
- This is a word-selection pattern, not glyph-level ornamentation for paragraph initial words
- Only appears ~62% of time (see Timm 2014 p. 19 You are not allowed to view links. Register or Login to view.)
- In Currier A: some pages have zero instances of <m>
- In Currier A: <m> is used word final instead of line-final (see You are not allowed to view links. Register or Login to view.)
- Varies significantly by section
- <am> and <dam> are common legitimate word types, not limited to line-final positions (see You are not allowed to view links. Register or Login to view.)
Additional patterns completely ignored:
- Line-first words are on average slightly longer than expected (see Elmar Vogt 2012: You are not allowed to view links. Register or Login to view.)
- Line-second words are on average slightly shorter than expected
- Line-first words prefer certain glyphs in initial position (see Emma May Smith 2015 You are not allowed to view links. Register or Login to view.)
- Gallows have different distributions at different lines and line positions
- Patterns vary between Currier A and B
- etc.
Why This Is Critical?
These are word-level and word-length effects. The Naibbe cipher:
- Operates at word-level (encrypts unigrams/bigrams)
- Has no line-position awareness
- Generates word lengths from plaintext identity and table selection
- Cannot produce position-dependent word-length or word-selection effects
This is not a limitation that could be addressed—it's a fundamental incompatibility between the cipher type and observed VMS properties.
HISTORICAL PLAUSIBILITY ISSUES
7. Backwards-Engineered Design Rationale
Playing Cards Example (p. 7-8):
The paper argues:
- Table ratios happen to sum to 13
- Playing card decks are multiples of 13
- Therefore cards are "especially convenient"
This is backwards reasoning. This only makes sense if you start from the ratio requirement (which was fitted to VMS). A historical cipher designer would:
- Choose ratios based on cryptographic needs
- Use the simplest available method
- Not rely on the coincidence that needed ratios match card deck structure
Complexity Gap (p. 28-30):
The paper acknowledges "the Naibbe cipher would represent a major leap in complexity over known 15th-century ciphers" but argues this is acceptable.
The Gap:
- Known 15th-century ciphers: 2-4 homophonic options per letter
- Naibbe cipher: 18 options per letter (6 unigrams + 6 prefixes + 6 suffixes)
- Plus: playing card apparatus, respacing procedure, ambiguity checking
The Proposed Evolution (p. 29-30): A 4-step speculative process for how the cipher could have evolved.
Problem: This is entirely speculative with no historical evidence. Each step requires insights not evident in the historical record. The paper presents this as plausibility argument but it's actually a just-so story.
8. The Practical Burden of the Cipher
Let me trace what a scribe must do to encrypt ONE plaintext letter:
Example: Encrypting the letter "A" in position "first in bigram"
Step 1: Determine if unigram or bigram
- Roll die OR follow predetermined respacing scheme
- Decision: bigram
Step 2: Determine position in bigram
- Decision: first position
Step 3: Draw a card
- Shuffle deck (or draw from remaining cards)
- Draw: "7 of Cups"
Step 4: Consult card-to-table mapping
- 7 of Cups → Table β₁
Step 5: Consult Table β₁
- Find row for letter "A"
- Find column "Bigram start"
- Result: <yt>
Step 6: Write <yt> in scratch encryption
- Wait for second letter of bigram
Step 7: (After encrypting second letter)
- Check if resulting word matches unigram word type
- If yes: RESTART from Step 3 (redraw card, re-encrypt)
- If no: Accept and continue
Keep in mind that these steps were necessary at least 57,000 times (~38,000 tokens ≈ 57,000-76,000 letters).
Question: Could you achieve comparable security with less complexity?
Example - Simplified Viable Alternative:
Materials: One table with 3 homophonic options per letter
Process:
1. Look up letter
2. Cycle through options (option 1, 2, 3, 1, 2, 3...)
3. Write symbol
The Naibbe cipher is much more complex for unclear security benefit.
The fundamental issue:
The Naibbe cipher could have been done in the 15th century in the narrow technical sense that the materials existed and the process is executable.
However, it is implausible that medieval cryptographers would have designed such a system because:
Inefficiency: the cipher is far too complex and therefore too slow to encrypt a whole book.
High error rate: Difficult to verify correctness
Training burden: it requires a lot of time to train someone to encrypt or to decrypt a message
Maintenance issues: Playing cards, extensive tables
No historical precedent: The complexity gap is enormous
In my eyes the cipher looks like it was designed by someone trying to match VMS statistics, not by someone trying to encrypt messages efficiently.
WHAT THE PAPER ACTUALLY CONTRIBUTES
Legitimate Contribution:
Demonstrating that a verbose homophonic substitution cipher can simultaneously satisfy multiple VMS constraints (entropy, token length, word grammar) is intellectually interesting. It shows that these properties are not individually impossible to replicate.
Not Demonstrated:
- That this specific cipher (with VMS-derived parameters) is anything more than elaborate curve-fitting
- That such a cipher is historically plausible
- That it provides evidence for the ciphertext hypothesis
- That it can explain key VMS properties (position effects, long-range correlations, word type diversity)
COMPARISON WITH STATED CRITERIA
The paper claims to meet Davis (2020a) criteria for acceptable VMS solutions. Assessment:
Met:
- Uses known 15th-century cipher type (homophonic substitution)
- Can be done with 15th-century materials
- Derivation is reproducible
- Maintains plaintext letter sequence
Not Met:
- "Historically plausible" - questionable given complexity gap and unexplained design choices
- "Reliably yield minimally ambiguous decryptions" - ambiguity handling is complex and unexplained
- Replicates VMS properties - only replicates properties it was explicitly fitted to; fails on others

