Welcome, Guest |
You have to register before you can post on our site.
|
Latest Threads |
Is Voynich a right-to-lef...
Forum: News
Last Post: luiscrassus
24 minutes ago
» Replies: 20
» Views: 1,729
|
Speculative fraud hypothe...
Forum: Theories & Solutions
Last Post: Mauro
31 minutes ago
» Replies: 92
» Views: 6,673
|
Comparing the Voynich by ...
Forum: Analysis of the text
Last Post: Jorge_Stolfi
40 minutes ago
» Replies: 1
» Views: 45
|
Digit 5 as "y" with tilde...
Forum: Analysis of the text
Last Post: Jorge_Stolfi
1 hour ago
» Replies: 3
» Views: 207
|
A simple substitution exp...
Forum: Analysis of the text
Last Post: ReneZ
2 hours ago
» Replies: 19
» Views: 596
|
Textual Complexity vs. Vi...
Forum: Imagery
Last Post: R. Sale
Yesterday, 10:01 PM
» Replies: 33
» Views: 2,465
|
VMs f75r - Nymph Holding ...
Forum: Imagery
Last Post: Stefan Wirtz_2
Yesterday, 07:45 PM
» Replies: 9
» Views: 1,745
|
Positional Mimic Cipher (...
Forum: Analysis of the text
Last Post: quimqu
Yesterday, 04:34 PM
» Replies: 45
» Views: 2,287
|
Summary of Voynich Day pr...
Forum: Analysis of the text
Last Post: Jorge_Stolfi
Yesterday, 02:40 PM
» Replies: 15
» Views: 6,571
|
research on script direcf...
Forum: News
Last Post: Labyrinthinesecurity
Yesterday, 01:17 PM
» Replies: 0
» Views: 182
|
|
|
Could synthetic “fake Voynich” tests tell us what we’re dealing with? |
Posted by: Phoenixjn - 09-09-2025, 03:34 PM - Forum: Analysis of the text
- Replies (4)
|
 |
Hello,
Rather than trying to decode the VMS outright, I think we can certainly answer other questions about the MS with the help of AI. For example, we should be able to decide with high probability whether the MS has real meaning or not. And I propose a system for doing that.
My own take: what best fits the evidence is that someone commissioned the VMS to be exactly what it is: a mysterious, impressive book that nobody can read because it contains no meaning. This is highly plausible and fits with the physical evidence, linguistic evidence and cultural evidence of the time. I believe it's a pseudo-text, made to look like a language, without actual meaning, for the purpose of impressing people, and I think AI can help us decide one way or another.
The question to explore is what kind of system is this text most like?
The idea is to generate large sets of synthetic manuscripts under different assumptions and see which "universe" the Voynich statistically belongs to. For example: - Ciphered Latin/Hebrew/Italian texts (various substitution styles)
- Real languages reformatted to look Voynich-like
- Constructed languages with invented grammar
- Structured pseudo-languages (rules for prefixes/suffixes, but no meaning)
- Shorthand/abbreviation systems treated as full glyph sets
Then we can measure each synthetic universe against a Voynich "fingerprint panel" (word shapes, entropy, Zipf’s law, affix patterns, section differences, etc.). Rather than asking what does it say? this approach asks "what system is it most like?" If structured pseudo-language consistently fits better than ciphered Latin or conlang universes, that's powerful evidence.
This wouldn’t solve the translation, but it would be an important step in understanding the MS and it would be one box checked off.
Does this kind of “synthetic benchmarking” sound worth trying? Has anyone attempted something like this at scale?
Anyway, here's where AI did a lot of the work in building an outline for how the experiment might go with only off-the-shelf tools. The goal is to see which universe (ciphered language, real language, conlang, structured pseudo-language, shorthand/abbreviation, etc.) best reproduces the Voynich’s full statistical “fingerprint.”
No, I don't have expertise in this kind of research. I'm only seeing where AI can point to help us check off some boxes and let those with the expertise run with it.
1) Define the universes (generate many fakes)
Make 200–2,000 synthetic manuscripts, each matched in length and page/line structure to the VM. Each fake follows one hypothesis with tunable knobs:
A. Ciphered Natural Language- Source corpora: medieval Latin, Italian, Occitan, Hebrew (public domain).
- Ciphers to implement:
- Simple monoalphabetic substitution
- Homophonic substitution (n symbols per plaintext char)
- Syllabic substitution (map digraphs/triagraphs)
- Nomenclator (frequent words → special symbols)
- Extras: line-initial/line-final rules, abbreviation expansion (Latin breviographs), occasional nulls.
B. Real Language (no cipher) shaped to VM layout- Rewrap real Latin/Italian texts into Voynich-like lines/paragraphs and enforce a few layout quirks (e.g., frequent “q-” line-initial tokens) to probe the effect of mise-en-page alone.
C. Conlang (meaningful but invented)- Generators:
- Finite-state morphology (prefix–stem–suffix classes).
- PCFG (probabilistic context-free grammar) with phonotactics.
- Dialects: Currier-A/B style parameter shifts (suffix set, token length).
D. Structured Pseudo-Language (no semantics)- Automata with rules like:
- prefix ∈ {qo, q, o, ch, …}, stem ∈ Σ{a,i,e,o,y}, suffix ∈ {dy, n, in, ain, aiin, …}
- position-dependent variants (line-initial gets more “q”)
- tunable affix productivity, stem entropy, and run-lengths
- Also include a human-plausible generator: Markov/HMM with simple constraints to simulate “fast fluent scribbling.”
E. Shorthand/Abbreviation Universe- Start with Latin prose; compress using a learned set of ~30–60 breviographs/abbreviations (e.g., -us, -rum, -que), then hide the mapping (treat abbreviographs as glyphs). Vary aggressiveness.
2) Build the Voynich “fingerprint panel”
Compute the same metrics for the true VM and for every synthetic manuscript:
Token/Type structure- Zipf slope & curvature; Heaps’ law α, K
- Word-length distribution; KS/AD distance vs VM
- Vocabulary growth by page/quire
Local dependencies- Char bigram/trigram distributions; JS divergence
- Conditional entropy H(Xₙ|Xₙ₋₁) and H(wordₙ|wordₙ₋₁)
- Mutual information vs distance (Hilberg-style curve)
Morphology & segmentation- Morfessor/BPE: number of subunits, affix productivity, stem/affix ratio
- Family proliferation: counts of {dain, daiin, daiiin}-type ladders
Positional/structural signals- Line-initial vs line-final token profiles
- Paragraph-initial bias
- Page/section (Herbal/Astro/Balne/Recipes) drift metrics (KL divergence)
Compressibility / model fit- LZMA/PPM/ZPAQ ratios
- n-gram perplexity (n=3..7) trained on one half, tested on the other
- Tiny Transformer perplexity (character-level) trained on each universe, tested on VM (cross-perplexity)
Clustering/embedding- UMAP/t-SNE on character n-gram vectors; silhouette vs VM cluster
- Rank-order correlation (Kendall τ) of frequency lists
3) Scoring: which universe fits best?
Use multiple, complementary criteria:- Distance aggregation: Normalize each metric to z-scores, then compute a weighted composite distance of each synthetic to the VM. Rank universes by median distance.
- Model selection: Approximate Bayesian Computation (ABC): treat generator knobs as priors, accept parameter settings whose synthetic stats fall within ε of the VM. Compare posterior mass across universes.
- Held-out validation: Fit knobs on half the VM; test distances on the other half (and per section).
4) Robustness checks- Ablations: remove line-position rules or suffix ladders—does fit collapse?
- Overfitting guard: ensure no generator is trained directly on VM tokens (only statistics), and verify generalization across sections.
- Adversarial baseline: try to force ciphered Latin to match VM—if it still lags pseudo-language on multiple metrics, that’s strong evidence.
5) Tooling (all off-the-shelf)- Python:
numpy, pandas, scikit-learn, matplotlib, networkx
- NLP/stat:
morfessor, sentencepiece (BPE), nltk for n-grams
- Compressors: builtin, lzma, bz2, zlib; optional PPMd via a Python wrapper
- Dimensionality reduction:
umap-learn, scikit-learn, (t-SNE/UMAP)
- Lightweight Transformers (optional):
transformers with a tiny char-LM
6) Workflow & timeline (lean team)
Week 1–2: Data wrangling (VM EVA, Latin/Italian corpora), page/line schema, metric code scaffolding
Week 3–6: Implement generators A–E; unit tests; produce first 500 synthetics
Week 7–8: Compute full fingerprint panel; initial ranking
Week 9–10: ABC fitting per universe; robustness/ablations
Week 11–12: Write-up, plots, release code & datasets (repro pack)
7) Readouts you can trust (what “success” looks like)- A league table: per-universe composite distance to VM (with error bars)
- Posterior plots: which parameter regions (e.g., high suffix productivity, low stem entropy) best match VM
- Confusion matrix from a classifier trained to tell universes apart using the fingerprint; if VM gets classified as “structured pseudo-language” with high confidence, that’s decisive.
8) “Citizen-science” version (solo, laptop-friendly)- Implement Universe D (pseudo-language) and Universe A(1) (mono-substitution over Latin).
- Compute a mini fingerprint: Zipf slope, word-length KS, bigram JS, compression, Morfessor affix productivity.
- Generate 100 synthetics for each universe; plot distance distributions vs VM.
- If pseudo beats ciphered Latin on 4/5 metrics, you’ve got a publishable note.
9) Pitfalls & how to avoid them- Layout leakage: VM line/page structure matters—always replicate it in synthetics.
- Cherry-picking metrics: pre-register the metric set; report all.
- Over-tuning: do ABC on one half; evaluate on the other.
- Section bias: score by section and overall; the winner should be consistent.
|
|
|
Textual Complexity vs. Visual Simplicity |
Posted by: quimqu - 08-09-2025, 08:28 AM - Forum: Imagery
- Replies (33)
|
 |
One of the things that intrigues me most about the Voynich manuscript is the poor quality of its drawings. Whenever I’ve seen manuscripts from the 15th century or even later, the clumsiness of the Voynich illustrator is striking. The drawings are made almost in a single stroke, with no attempt at shading or adding the slightest grace to the figures. From the plants to the nymphs or the zodiac signs, they look like something a child of six or seven could have drawn.
I understand that by that time art had already reached quite a high level of quality; the Renaissance was just emerging in Italy in the 15th century. So whoever produced the illustrations must have been an amateur, and quite a poor one at that.
What is even more surprising is the contrast between the complexity of the text (whether it is an actual cipher or an invented script) and the low quality of the images. One senses the ambition to depict grand ideas, such as the elaborate foldout diagrams or the roses, yet the final result feels clumsy and impoverished when compared with the artistic standards of the time. I understand that the Renaissance was not accessible to everyone, but even the humblest artistic traditions of the period offered a more faithful representation of reality than what we see in the Voynich.
It is also striking that researchers have identified different scribes at work in the text, while the illustrations — at least the thematic ones — seem to share the same hand and style. It is difficult to imagine two people independently drawing the nymphs, for example, in exactly the same (and equally unconvincing) manner with respect to human anatomy. This further reinforces the impression that the text and the images may have followed very different logics of production.
That said (and I hope not to offend, I’m a Voynich enthusiast myself!), I wonder whether any other manuscripts from the period show a similar poverty of representation, with drawings that are unrealistic and poorly proportioned.
|
|
|
Brainstorming Session: Mapping Voynich with Graphs |
Posted by: quimqu - 07-09-2025, 04:50 PM - Forum: Analysis of the text
- Replies (5)
|
 |
Hi everyone.
I’m Joaquim Quadrada—Quim for short (that-s the reason of my nickname quimqu). I’m 51, from Barcelona, a native Catalan speaker. I formerly was a mechanical engineer and moved into data science two years ago, finishing a master that lasted 3 years. Even if during the master we worked the linguistic part of data science, I’m not a linguist. I open this thread because I think graph methods can serve the Voynich community as a practical, transparent way to poke at structure and test ideas, and Graphs are quite a new area to investigate.
By “graph” we understand a set of points and lines. The points (nodes) can be words or bits of metadata such as “section”, “Currier hand”, “writing hand”, or “position in line”. The lines (edges) connect things that co-occur or belong together. We can also add information to the edges, direction, and a lot of things that creates the relationship between the nodes. Once you cast the transliteration as one or more graphs (and yes, we can join graphs), you can ask graph-native questions: which links are unexpectedly strong once chance is controlled, which words act as bridges between otherwise separate clusters, which small patterns (A→B→C chains or tight triangles) recur at line starts, how closely word communities align with metadata nodes (sections, hands, line-position), and whether any directed paths repeat often enough to count as reusable templates. None of this decides whether the text is language or cipher, but it can highlight stable regularities, quantify them, and rank hypotheses for experts to examine.
I’d like to open a brainstorming thread to push ideas that are worth trying on top of these graphs.
As a concrete example, I started with the first lines of paragraphs (what I call L1) and compared them to all other lines. Building co-occurrence graphs, the L1 network consistently comes out denser and more tightly clustered. When I switch to a small sliding window (a “micro-syntax” view), the L1 graph splits into more distinct communities, which is what you’d expect if opening material uses more fixed combinations. I also looked for actual line-start bigrams that repeat. A couple of pairs do appear at L1 and not elsewhere, but the evidence is thin; they behave more like soft habits than hard formulas. To see broader context, I built a bipartite graph that connects words to their metadata (position, section, hand). Projecting this graph shows a clear cohort of words that lean toward L1, and it also shows which sections and Currier hands share that opening behavior. All of this is descriptive and testable; nothing here presumes a linguistic reading or a cipher.
This, for example, is the graf for the first lines of the paragraphs:
![[Image: mZs74BP.png]](https://i.imgur.com/mZs74BP.png)
To illustrate what I mean by opening units at L1, here’s a table with the two bigrams that pass the defined thresholds: they have positive ΔPMI versus the rest of the text (ΔPMI > 0 means the bigram is more tightly associated in L1 than in the other lines) and they always occur at the start of a line. I’ve added short KWIC (Key Word in Context) snippets for context.
Bigram | Count | ΔPMI | Line-start | KWIC (examples) |
polor sheedy | 2 | 6.234 | 100% | f112v: [polor sheedy] … sheedar … | f115v: [polor sheedy] … qokechy … |
tshor shey | 2 | 4.799 | 100% | f15r: [tshor shey] … chtols … daiin | f53v: [tshor shey] … oltshey … qopchy |
What I wish now are linguistics eyes and instincts. If you can suggest a better unit than whole EVA “words” (for example, splitting gallows and benches, or collapsing known allographs), I will rebuild the graphs and quantify how the patterns change. If you have candidates for discourse-like items that might prefer line starts, I can measure their positional bias, role in the network, and the contexts they pull in. If there are section or hand contrasts you care about, I can compare their “profiles” in the bipartite projection and report which differences are solid under shuffles and which are noise.
I’ll keep my end practical: small, readable tables; KWIC lines for anything we flag; and ready-to-open graph files. If this sounds useful, I’ll post the current outputs and code, and then iterate with your guidance on segmentation, normalization, and targets worth testing.
My only goal is to make a common playground where your expertise drives what we measure.
|
|
|
f67v2 - Some comparisons |
Posted by: Bluetoes101 - 06-09-2025, 10:39 PM - Forum: Imagery
- Replies (4)
|
 |
Egerton MS 845 (F.21 v)
Probably just a coincidence, but I thought the top left triangle design and top right cross design (scroll) might be note worthy given the general similarities between the two images in general. The overall layout has quite some similarities also. I can't find scans for this MS anymore, though it was linked on here years ago and noted as "first half of 15c" in 2021.
On a side note - The image also shows up as being part of Harley 2407 which was 15c and contained many later notes by readers, the most famous of which were Dee and Ashmole.
The other is from Micheal Scot - Liber introductorius, which is a very interesting work, and person (if you like rabbit holes)
This is a 14c copy (around 1320) - You are not allowed to view links. Register or Login to view.
The images are regarding eclipses, though I found the general way it was drawn to have some similarities with the VM images, even if the meaning is seemingly different
You will be able to find many great images in this MS, if you browse backwards from this page you will find all sorts of great images for planets and zodiac signs, some more relatable to VM than others.
Bonus merlons and weird sun face for Koen
I don't really have much to claim on these, just thought I'd share and not let it rot in the list of things I'll probably forget about.
|
|
|
My Theory: RITE — Ritual Instrument of Textual Esoterica |
Posted by: GrooveDuke - 06-09-2025, 09:40 PM - Forum: Theories & Solutions
- Replies (18)
|
 |
[Theory] RITE — Ritual Instrument of Textual Esoterica
Thesis. The Voynich Manuscript is an authentic early-15th-century object whose unreadability was the point. It functioned as a Ritual Instrument of Textual Esoterica (RITE)—a performative prop that looked like language and conferred authority on its owner in consultations/rites—rather than a book intended for general decoding.
That's my TL;DR. If it intrigues you or reminds you of something that has come before of which I am unaware, read on...
Motive. The same motive as many medieval forgeries: money.
How I got here. Statistical work persuaded me the text isn’t straightforward natural language. A recent Voynich Day 2025 talk by Michael (“Magnesium”) showed that historically plausible 15th-century methods can generate Voynich-like strings from meaningful text. That demonstrates feasibility of a ciphered surface. It does not establish an intent to decode. My claim: unreadability was a feature for performance, not a bug to be solved. (Modern analogue: Joseph Smith’s plates—power via exclusive “translation.”)
Function, not content. I use “rites” broadly—any performative act (divination, healing, religious consultation) where the owner interprets an unreadable authority object for a client. Images anchor recognition; unreadable text supplies mystery; performance supplies authority.
Historical timeline (why RITE fits the period) - Creation (1404–1438 vellum window). Late-medieval Europe supported markets for “books of secrets,” astrological images, and esoteric medicine. A convincing pseudo-language manuscript could be produced relatively cheaply yet serve high-value ritual and consulting roles.
- Use phase (15th century). The manuscript shows wear consistent with handling. Unreadable/arcane texts could operate openly: monastic settings, itinerant healers, court astrologers.
- Shift and decline (post-1517). The Protestant Reformation and the Catholic Counter-Reformation hardened attitudes toward “magical” books and spurious relics. An unreadable prop without sanctioned theology becomes risky for Protestants and Catholics.
- Afterlife (16th–17th centuries). As “duping the locals” gets harder—and penalties greater—the manuscript’s functional value collapses. It survives as an exotic curiosity, eventually sold when it’s no longer useful (or safe) as a working prop.
Section-by-section: how RITE could operate in practice- Herbal (plants/roots). Healer points to a strange plant with dense glyphs: “The remedy is written here.” A potion is prescribed. Plant imagery makes it feel concrete; the text signals hidden expertise.
- Astrological (zodiac, stars). For divination: gesture to a zodiac wheel—“Your house this year aligns thus; the text confirms it.” Recognizable symbols guide the client’s imagination; unreadable labels make it authoritative.
- Balneological (nude figures, pipes, baths). Esoteric therapies: “These are purifications for health/fertility.” The imagery implies procedure; the script implies exact doctrine only the interpreter can unlock.
- Pharmaceutical (jars, roots, compound lists). “Codified pharmacy.” Point to a jar vignette, narrate a “translation,” mix a preparation. The book serves as the credential behind the recipe.
- Recipes/Stars (short paragraphs, star markers). Performative instruction: “Each star is a step.” Trace lines as though following a protocol, then deliver a chant, cure, or prognosis. Structure without readability.
What would falsify RITE- A coherent, page-level decipherment that preserves known statistics and yields content matching the imagery domains—showing the book was meant to be read beyond its maker.
- Provenance tying it to a didactic or bureaucratic purpose inconsistent with staged opacity.
- Material/ink sequencing inconsistent with extended practical handling.
What would strengthen RITE- Documents describing unreadable “books of secrets” used in healing/divination ca. 1400–1500 (especially Central Europe).
- Inventories, bans, or trials referencing pseudo-alphabet manuscripts pre-/post-Reformation.
- Close analogues (e.g., Trithemius/Dee) where cipher-like text doubled as a ritual prop.
Bottom line. RITE doesn’t deny the possibility of meaning; it argues the manuscript’s purpose was performative unreadability. It was used for something—authority in rites—then lost cultural utility as the religious/intellectual climate changed.
I looked for earlier posts beyond the “medieval forgery” umbrella and didn’t find this exact framing. If RITE (performative unreadability; prop-cipher use) has already been proposed, please link threads/papers/blogs—happy to read, credit, and continue there. Mods: fine to merge if redundant.
Most helpful feedback right now:- Pointers to prior art on “prop/ritual use” models of the Voynich.
- Counter-arguments/falsifiers I haven’t considered.
- Historical breadcrumbs (inventories, bans, trials, testimonies) mentioning unreadable “books of secrets” c. 1400–1500.
- Method ideas to test RITE vs. alternatives (e.g., wear patterns, long-range glyph correlations, material sequencing).
Thanks in advance for links, critiques, and corrections. If this is old hat, I’ll fold into the existing discussion and refine accordingly
|
|
|
Resonance |
Posted by: InkandGrace - 06-09-2025, 03:16 AM - Forum: Theories & Solutions
- Replies (4)
|
 |
You can translate all day, and if you aren't a medieval music major you won't accomplish much.
Its the same exact format as many chants and mantras the world over. You are not allowed to view links. Register or Login to view. line 4, and 5 chorus fingerprint (parsons) 8 unit cell is RURUUDR and
full chorus is RUDRUUDRRUDRUUDRRUDRUUDR
That is as far I have gotten.
|
|
|
[split] Word length autocorrelation |
Posted by: ReneZ - 05-09-2025, 12:27 AM - Forum: Analysis of the text
- Replies (22)
|
 |
(04-09-2025, 09:59 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In real languages, the correlation is usually slightly positive: long words tend to follow long words, short after short. But in the Voynich, it’s negative (about –0.07). That means the text tends to alternate — long words are followed by short words, and short by long.
It gives the text a zig-zag rhythm.
If that is a 'normalised' correlation, i.e. not a covariance, then the value -0.07 means 'no correlation'.
So, it depends on how it was calculated.
(04-09-2025, 09:59 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.When I scrambled the text within lines as a test, the alternation got even stronger, which proves the manuscript isn’t random, but it’s still unlike natural language.
This is not expected, and suggests that both cases effectively show 'no correlation'.
|
|
|
Combination of two manuscripts as a way of reading the text? |
Posted by: sfiesta - 03-09-2025, 07:01 PM - Forum: Analysis of the text
- Replies (3)
|
 |
Hello everyone.
Do you think it is possible that two manuscripts were used to read the text - one known to us as the Voynich Manuscript, and the second lost one, which contained the same illustrations and diagrams, but a different text? For example, the words of manuscript №2 could consist of ordinary, but cleverly mixed Latin letters, and the sequences of symbols in the Voynich Manuscript could be a visual instruction for bringing these words into a readable form. This could explain the strange statistical distribution of words in the Voynich Manuscript - it's just that in the hypothetical manuscript №2, the same normal word of a European language could be encoded by many variants of mixing the original letters - accordingly, the visual instruction for decoding the same word could be different in different places of the manuscript.
I understand that this theory is not the path to solving the mystery, but I believe that this theory also has a right to exist.
|
|
|
|