Welcome, Guest |
You have to register before you can post on our site.
|
Online Users |
There are currently 269 online users. » 2 Member(s) | 264 Guest(s) Applebot, Bing, Google, Bluetoes101, Oocephalus
|
Latest Threads |
Red Herrings are sometime...
Forum: Voynich Talk
Last Post: Bluetoes101
18 minutes ago
» Replies: 1
» Views: 15
|
A good match, perhaps fro...
Forum: Marginalia
Last Post: R. Sale
1 hour ago
» Replies: 38
» Views: 1,334
|
More Germanic influences ...
Forum: Astrology & Astronomy
Last Post: magnesium
4 hours ago
» Replies: 19
» Views: 18,115
|
Favorite Plant Tournament...
Forum: Voynich Talk
Last Post: Bluetoes101
7 hours ago
» Replies: 16
» Views: 729
|
f17r multispectral images
Forum: Marginalia
Last Post: Aga Tentakulus
8 hours ago
» Replies: 85
» Views: 15,710
|
Need advice for testing o...
Forum: Analysis of the text
Last Post: nablator
Yesterday, 01:06 PM
» Replies: 91
» Views: 4,662
|
SOLUTION/ the Voynich Man...
Forum: Theories & Solutions
Last Post: ReneZ
Yesterday, 10:27 AM
» Replies: 10
» Views: 367
|
Music of the Spheres and ...
Forum: Theories & Solutions
Last Post: Kris1212
17-07-2025, 03:25 PM
» Replies: 4
» Views: 273
|
How LLM models try to und...
Forum: Analysis of the text
Last Post: quimqu
17-07-2025, 08:13 AM
» Replies: 5
» Views: 372
|
Written in a mirror?
Forum: Theories & Solutions
Last Post: oshfdk
16-07-2025, 06:57 PM
» Replies: 2
» Views: 181
|
|
|
[Poll] What *are* vords? |
Posted by: RadioFM - 30-06-2025, 12:20 AM - Forum: Analysis of the text
- Replies (11)
|
 |
Poll no. 1 (see above): Nature of vords
When decoded, most (if not all) Voynich vords will turn out to be... (Choose 1 option)
a) Words or almost whole words
b) Syllables, bigrams, n-grams
c) Single letters (or phonemes)
Assume any dummy/null vords or characters removed
Poll no. 2: Nature of the cipher
I believe the bulk of the text was ciphered using... (Multiple choices allowed)
☐ Dummy (null) characters or strokes
☐ Dummy (null) words
☐ Transposition within words
☐ Transposition within lines
☐ Indexed codebook (not just for a minority of words, but for the bulk of text)
☐ Auxiliary devices for encoding (wheels, matrices)
☐ State or context-dependent encoding
☐ Something else
☐ ☑
Considering that good progress has been made in showing that VMS is certainly not Latin, Italian or German ciphered through simple substitution, I was wondering what do you think VMS vords would look like, when decoded.
I'm aware of the many nuances you all may hold about differences in Currier/RZ languages, topics, dummy words and padding text, etc. I'd appreciate it if you could try to cast your vote within the (limited) options given and explain further in the comments.
I'm interested in polling those who hold the more "traditional" views, namely that it's likely ciphered Latin, Romance, Germanic, or the like.
|
|
|
The Voynich Manuscript revealed |
Posted by: Torsten - 28-06-2025, 10:51 PM - Forum: News
- Replies (2)
|
 |
Garry Shaw: "The Voynich Manuscript revealed: five things you probably didn't know about the Medieval masterpiece" (You are not allowed to view links. Register or Login to view.)
Quote:Gibberish
A recent experiment in which volunteers were asked to write pages of gibberish produced texts with similar characteristics to the Voynich Manuscript. The volunteers tended to intersperse a string of long words with a string of short words, chose short words beside illustrations according to the available space, and, in headings, used variations of the title words in the text below. Significantly, the volunteers invented gibberish using a process called self-citation, in which new words largely adapt those written earlier. Scholars have previously proposed this as the method used by Voynich scribes. Is the Voynich Manuscript therefore… meaningless?
|
|
|
Identifying paragraphs in the Starred Parags section |
Posted by: Jorge_Stolfi - 27-06-2025, 09:26 PM - Forum: Analysis of the text
- Replies (2)
|
 |
I am trying to figure out the paragraph breaks in the Starred Parags (aka Recipes) section.
I will use these terms: - parag: short for paragraph.
- head of a parag: its first line.
- tail of a parag: its last line,
- puff: a one-legged gallows, either {p} or {f}, with or without the platform slash.
- margin: the mostly text-free space between an edge of the page and the text.
- left rail: the ideal mostly vertical and straight line that runs just to the left of the majority of lines of a page, separating the left margin from the text.
- right rail: the ideal mostly vertical and possibly wavy but fairly smooth line that runs just to the right of the ends of most lines of a page, separating the text from the right margin.
- long line: a text line that starts at the left rail and ends at or beyond the right rail.
- short line: a text line that starts at the left rail but ends well before the right rail.
- baseline: the ideal usually smooth curved line that runs just below the glyphs of a text line, excluding the tails of {y}, {m}, {l}. etc..
- linegap: the vertical distance between baselines of successive lines; which often varies over the width of the text.
- wider linegap: a line gap that is wider than normal, at least in some part of the lines (e.g. left side, right side, or middle).
- topline: an ideal line parallet to the baseline, such that the distance between the two is the height of an EVA {o} in the line's handwriting.
- midline: an ideal line parallel to the baseline and the topline, equdistant from the two.
- starlet: a star in the margin that has been assigned to a unique line, like a bullet in an item list.
The posiitions and even the count of stars in each page are not reliable, since they sometimes do not match the obvious paragraph breaks. Thus the assignment of starlets to lines is to be determined as part of identifying the parag breaks. However, I will assume that every starlet should be assigned to a different line.
That saiid, a paragraph should ideally be a bunch of consecutive lines with all of the following properties:- P1. The first of these lines follows a short line (or is the first line in the SPS, or follows a "title");
- P2. The last of these lines is short (or is the last line of the SPS, or precedes a "title").
- P3. All lines other than the last one are long lines.
- P4. There are no puffs in any of these lines except possibly in the first of them.
- P5. The first of those lines has an assigned starlet.
- P6. None of these lines, except the first one, has an assigned starlet.
I will call a set of lines with all these properties a perfect parag. I will assume that they are indeed paragraphs as intended by the Author.
The following table gives some relevant statistics per page, with a tentative assignment of starlets:- Stars: Number of stars in the page.
- ShLns: Number of short lines in the page
- Puffd: Number of lines that contain puffs (one-leg gallows).
- PerfP: Number of perfect parags in the page.
Code: page ! Stars ! ShLns ! Puffd ! PerfP
------+-------+-------+-------+-------
f103r | 19 | 18 | 14 | 15
f103v | 14 | 12 | 14 | 9
f104r | 13 | 13 | 13 | 13
f104v | 13 | 13 | 8 | 11
f105r | 10 | 11 | 15 | 6
f105v | 10 | 14 | 20 | 3
f106r | 16 | 15 | 17 | 13
f106v | 14 | 16 | 16 | 14
f107r | 15 | 15 | 13 | 10
f107v | 15 | 15 | 13 | 14
f108r | 16 | 17 | 13 | 8
f108v | 16 | 5 | 8 | 1
f111r | 17 | 10 | 7 | 4
f111v | 19 | 8 | 11 | 6
f112r | 12 | 11 | 13 | 8
f112v | 13 | 15 | 14 | 12
f113r | 16 | 16 | 17 | 12
f113v | 15 | 15 | 16 | 15
f114r | 13 | 11 | 13 | 11
f114v | 12 | 11 | 12 | 9
f115r | 13 | 13 | 12 | 12
f115v | 13 | 13 | 12 | 12
f116r | 10 | 8 | 10 | 5
------+-------+-------+-------+-------
TOTAL | 324 | 295 | 301 | 223
As it can be seen, on page You are not allowed to view links. Register or Login to view. the counts of stars, short lines, and puffed lines match and the whole texts consists of perfect parags. On other pages there are lines which cannot be placed in perfect parags. I will have to compromise on one or more of the criteria above. Stay tuned...
|
|
|
How multi-character substitution might explain the voynich’s strange entropy |
Posted by: quimqu - 27-06-2025, 10:34 AM - Forum: Analysis of the text
- Replies (23)
|
 |
Correction
Originally, I described the transformation used as a homophonic cipher, but that label is misleading. What I actually applied was a form of multi-character substitution, where each letter in the original word is replaced by a randomly chosen variant (e.g., a0, a1, a2), simulating a kind of randomized expansion at the character level. This isn't a true homophonic cipher in the historical sense — which typically replaces plaintext characters with multiple possible cipher symbols without increasing the total character count. My version expanded the text significantly and altered its structure.
Despite the naming inaccuracy, the method did reproduce an entropy curve similar to the Voynich CUVA profile, especially in the characteristic “bump” around n=3–6. The results still support the hypothesis that some kind of structured substitution — possibly at the syllable or morph level — could account for the entropy behavior in the Voynich manuscript. However, any conclusions should be interpreted with this clarification in mind.
You can also check this post of mine where you can see the entropy bump comparing the MS in EVA and in CUVA versus natural languages texts:
You are not allowed to view links. Register or Login to view.
Maybe by accident, I’ve pulled on a thread worth following — I’ll keep exploring what really generates the bump.
------------------------------------------
In this experiment, I tried to simulate how different historical ciphers affect the entropy profile of a text, and compare the results to the Voynich CUVA (explained here You are not allowed to view links. Register or Login to view. by René Zandbergen). The idea was to test whether the statistical behavior of the Voynich text—especially its distinctive “entropy bump”—could emerge from known cipher types.
Method
I took the Latin text De Docta Ignorantia and applied 10 classical cipher transformations likely known or possible in the 15th century:
- Syllabic substitution
- Homophonic cipher
- Caesar cipher
- Grammatical expansion
- Transposition cipher
- Contextual substitution
- Polyalphabetic cipher
- Cardano grille
- Relative-position encoding
For each version, I measured n-gram entropy per word (resetting after every word) from n=1 to n=14.
I then plotted these values against the Voynich CUVA section.
![[Image: uLOSZCq.png]](https://i.imgur.com/uLOSZCq.png)
This graph shows that most cipher types produce entropy curves that drop steeply after n=3–5, while the Voynich text declines gradually and smoothly. This is already unusual.
But there's one exception...
Homophonic cipher anomaly
Only the homophonic cipher (3+ variants tested) produces an entropy “bump” that matches the Voynich profile. Specifically, when using a homophonic cipher with 3 or 4 characters per symbol, the entropy curve is smoother and shows a slow decay, similar to the CUVA data.
This raises two hypotheses:- A system with homophonic encoding of syllables or morphs could recreate a Voynich-like structure.
- The smoothness of the curve may suggest internal rules or language constraints, not just random substitution.
![[Image: 4aEWNbM.png]](https://i.imgur.com/4aEWNbM.png)
Notice how the 3- and 4-character homophonic ciphers almost replicate the Voynich curve — both in shape and range. The 2-character version decays a bit faster but still mimics the bump.
Natural text vs. Voynich
To test if this was just a quirk of De Docta Ignorantia, I took four different natural texts (Latin, French, English):
- Ambrosius Medionalensis In Psalmum David CXVIII Expositio (Latin)
- La reine Margot (French)
- Romeo and Juliet (English)
- De Docta Ignorantia again
Each was encrypted with a 3-character homophonic cipher and compared to Voynich CUVA.
![[Image: kSTbMuI.png]](https://i.imgur.com/kSTbMuI.png)
Interestingly, when using a 3-character homophonic cipher on natural texts (Latin, French, English), the entropy curves become much smoother and more sustained. For several of them, the n-gram entropy remains high up to n=6–7, and only drops significantly past n=8 or n=9.
The curve shapes are now visibly closer to Voynich CUVA, with the most similar being De Docta Ignorantia and Romeo and Juliet. However, the Voynich text still has:- A slightly smoother and more consistent decay, without sudden drops
- A more gradual “tail” beyond n=9, where others still not flatten or zero out (except Romeo and Juliet)
This supports the idea that some homophonic structure — perhaps morph- or syllable-based — could explain the entropy shape. But it also reinforces the notion that Voynich words follow a more regulated internal logic, possibly due to morphological templates or position-based constraints.
Interpretation
There are two key features that stand out:- The “Voynich bump” (sustained entropy around n=3–6) is only replicated by homophonic substitution.
- The smoothness of the curve in CUVA suggests an underlying linguistic system — natural or artificially constructed — rather than arbitrary encoding.
This doesn’t prove the Voynich uses a homophonic cipher, but it does suggest that such systems can generate statistically similar profiles, especially when applied at the syllable or morph level.
It may also support theories that posit an artificial language, a constructed morphology, or template-driven word generation, all of which maintain internal consistency over longer n-grams.
|
|
|
Was the VM a failure? |
Posted by: Bernd - 26-06-2025, 02:31 PM - Forum: Voynich Talk
- Replies (28)
|
 |
Many theories about the Voynich Manuscript portray it to be an ingeniously clever cipher or a novel method to encode a foreign or constructed language, often claimed to be invented by a famous person. Indeed all (serious) attempts to make sense of the text have utterly failed so far.
But is this hypothesis really feasible? Despite the countless things we do not know about the VM, we can make two statements with great confidence:
1)The mechanism by which the VM text was created did not gain traction and become widely used around the time the VM was made in the 15th century.
2)No even remotely similar encoding mechanism evolved in the next ~600 years until today.
3) Despite countless attempts since Wilfried Voynich's time a century ago, the 'code' remains uncracked.
This should raise some serious doubts.
While examples for brilliant inventions that were lost in time exist (Antikythera mechanism), I do not think this is a parsimonious hypothesis. Given the overall rather amateurish and provincial look of the VM, I think it is far more likely that:
.)If the VM contains enciphered information, the encipherment process is probably too cumbersome and ineffective for most scenarios.
.)The VM served a very narrow and probably personal purpose that did not require the decipherment process to be practical for a wider audience, maybe relying on a-priori knowledge of the contents like a mnemonic aid.
Or - the VM text was never meant to contain any information and was created for whatever different purpose altogether. Again we fail to find comparable examples.
I am not fond of deliberate hoax hypotheses, simply for the almost fractal complexity and level of details we can see in the VM text and imagery, unnecessary for a hoax. But it certainly cannot be ruled out. Yet, no even remotely complex hoax document was ever uncovered.
Regardless of the intention behind the VM, what we can say for sure is that it's creation process wasn't a success story that was frequently repeated. It may have served a purpose for the author or a very small circle, and I do believe it was important for its creator(s) because of the sheer work involved, but it appears unlikely the project was of any broader significance beyond that.
Had it been a ground-breaking and practical invention, it would either have spread fairly quickly or re-evolved in the next hundred years. I think we should keep that in mind.
|
|
|
Phonetic Borderland Hypothesis |
Posted by: Oliver Martin Rarrek - 26-06-2025, 11:31 AM - Forum: Theories & Solutions
- Replies (14)
|
 |
# ? A Multilingual Recipe Structure in the Voynich Manuscript
Dear all,
I'd like to share a new approach I’ve been exploring with assistance from a language model. It’s based on the hypothesis that the Voynich Manuscript might be written in a **phonetically encoded contact language** from a multilingual **border region**, possibly located in Central Europe (e.g., the Alps–Adriatic or Pannonian area). The working theory is that the text is **written as spoken**, without conforming to standardized spelling conventions of any known medieval language.
---
## ? Key Hypothesis
- The text reflects a **mixed oral vernacular** influenced by Romance, Slavic, and Germanic elements.
- The Voynichese words may be **phonetic spellings** (or ciphered approximations) of these spoken forms.
- Especially in the **recipe-like sections**, the internal structure mirrors known medieval formats:
- `Ingredients → Preparation → Medium → Application`
---
## ? Example Segment
**Voynich (EVA):** `qokedy shedy qokal ol dal dain`
**Phonetic reconstruction:** `koket skedna kocha ol daal dain`
**Possible interpretation:** *“Cook (it), strain (it), give (it) then (on), divide (it) finely”*
This structure is surprisingly similar to entries in early recipe books like the *Liber de Coquina*, Czech and Slavic herbal traditions, and entries in the [CoReMA database](You are not allowed to view links. Register or Login to view.).
---
## ? Why it might be promising
- **Word-length distribution** of the reconstruction aligns well with medieval medical/cooking texts.
- **Consistent morphological markers** appear at word endings (e.g., functional endings for verbs or instructions).
- **Segmented text structure** makes semantic patterns more recognizable (like recipes).
- The phonetic base allows **plausible natural language patterns** without invoking random text generation.
---
## ? Full PDF Summary
I’ve compiled an exploratory paper outlining the hypothesis, methodology, and examples here:
? [Download PDF: *Phonetic Borderland Hypothesis*](sandbox:/mnt/data/Voynich_Grenzraum_Hypothese.pdf)
---
## ? Call for collaboration / feedback
I’d be very interested in your thoughts on:
- The linguistic plausibility of a Central European oral vernacular base
- Any parallels to known dialects, glagolitic or early Germanic–Romance scripts
- Ideas for testing this on larger sections of the manuscript
- Collaborative work on identifying candidate vocabulary using phonetic heuristics
Thanks for reading!
Looking forward to your insights and constructive critique.
BR
Oliver
|
|
|
Need advice for testing of hypotheses related to the self-citation method |
Posted by: nablator - 25-06-2025, 06:14 PM - Forum: Analysis of the text
- Replies (91)
|
 |
For the choice of a previous word in the "self-citation" (with modification) method, even if the selection of words is non-deterministic and more or less random, humans don't act like computers, so there would certainly be a human bias in selection... detectable or not in the text of the VM? Would the detection of a bias be good evidence for the "self-citation" (with modification) method or not? I'm not sure.
Likely patterns for the selection of source-target words could be: use two (or more) consecutive words together for the next two generated words (in any order), or skip one word (or two) while reading, because it's easier to read several close words together than just one and then go to a totally different area of the page and read the next word.
Type 1a:
... source1 source2 ...
...
... target1 target2 ...
Type 1b:
... source2 source1 ...
...
... target1 target2 ...
Type 2a:
... source1 skipped source2 ...
...
... target1 target2 ...
Type 2b:
... source2 skipped source1 ...
...
... target1 target2 ...
I've only tested these patterns, with source words on the same line, target words on the same line, all on the same page. They create many more "hits" (possible source-target locations by small Levenshtein distance) than in a word-shuffled page. But we already know that word order is not random in the VM, so I'm not sure if the statistics really show bias in selection of source-target or if the word ordering biases (such as the known y_q affinity and others, known and unknown) get transferred to the similar but modified words. I would appreciate advice on how to resolve the issue. 
A more general human bias for the selection of the next source word(s) would be the close proximity on the page, not necessarily on the same line: proximity between source and target or proximity between sources.
Note to self: I need to try various patterns, expected and unexpected, and explain better which hypotheses I'm trying to test.
|
|
|
Why the Voynich Manuscript Might Not Be a Real Language |
Posted by: quimqu - 24-06-2025, 09:57 PM - Forum: Analysis of the text
- Replies (26)
|
 |
In this post, I’ll walk you through a machine learning approach I used to analyze the Voynich Manuscript using character-level n-gram language models, a simple but powerful way to measure how predictable a text is. My goal was not to decode the Voynich, but to compare its statistical structure to that of other known texts — including literary works, religious treatises, and artificially encrypted versions — to see if it behaves like a natural language, a cipher, or something entirely different.
What Are Character-Level N-grams and Perplexity?
Before diving into the results, let’s quickly explain two key concepts: - Character-level n-grams: These are sequences of n consecutive characters. For example, in the word "language", the 3-grams (trigrams) are
lan
ang
ngu
gua
uag
age
- An n-gram model learns the likelihood of seeing a particular character given the previous n-1 characters.
- Perplexity: This is a measure of how well a model predicts a sequence. Low perplexity means the model can easily predict the next character — the text is “regular” or “learnable.” High perplexity means the text is less predictable, like a noisy or complex system. It’s often used to evaluate how well a language model fits a dataset.
The Experiment
I trained simple n-gram models (from 1-gram to 9-gram) on the following types of texts:- Classical literature (e.g., Romeo and Juliet, La Reine Margot)
- Religious and philosophical texts (e.g., Ambrosius Mediolanensis, De Docta Ignorantia), with date of creation simmilar to the MS
- Ciphered texts using a Trithemius-style letter substitution
- The Voynich Manuscript, transcribed using the EVA alphabet
For each text, I split it into a training and validation set, trained n-gram models by character, and computed the perplexity at each n-gram size. I plotted these to visualize the predictability curves.
![[Image: Y93Ys1l.png]](https://i.imgur.com/Y93Ys1l.png)
What Did I Find?
The results were surprising:
- The Voynich Manuscript exhibits surprisingly low perplexity for high n-grams (n=7 to n=9) — much lower than expected for a truly random or strongly encrypted text.
- Its perplexity curve closely resembles that of religious or philosophical medieval texts, such as De Docta Ignorantia and Ambrosius Mediolanensis. These texts also show low perplexity at high n-grams, reflecting strong internal regularity and repetitive patterns.
- In contrast, literary texts like Shakespeare or Dumas show a sharp increase in perplexity for high n-grams, indicating a richer and more unpredictable sequence of characters.
- Artificially encrypted texts using simple substitution ciphers (like Trithemius-style transformations) show consistently high perplexity, since character distributions are scrambled.
Interpretation
This suggests something important: The Voynich Manuscript does not behave like a substitution cipher or a natural literary language. Instead, it statistically resembles structured, repetitive writing such as liturgical or philosophical works.
This does not mean it’s meaningful — but it does imply that the text might have been designed to look structured and formal, mimicking the style of medieval sacred or scholarly texts.
Its internal predictability could arise from:- Repeated formulas or ritualistic phrases
- A constrained or templated grammar
- Artificial generation using consistent rules (even if meaningless)
Conclusion
While many have tried to translate the Voynich Manuscript into known languages or decode it with cipher-breaking techniques, this analysis suggests that a direct translation approach may be futile. The manuscript’s character-level structure mirrors that of repetitive, highly formalized texts rather than expressive natural language or encrypted writing.
Any attempt to decipher it without first understanding its generative rules — or lack thereof — is likely to miss the mark.
That said, its statistical behavior is not unique. Other texts from the same era show similar n-gram patterns. So perhaps the Voynich isn’t a hoax — it might just be mimicking the structure of sacred or scholarly texts we no longer fully understand.
|
|
|
|