| Welcome, Guest |
You have to register before you can post on our site.
|
| Online Users |
There are currently 601 online users. » 5 Member(s) | 590 Guest(s) Applebot, Baidu, Bing, DuckDuckGo, Google, Yandex, Dana Scott, oeesordy, yinyang2024
|
|
|
| Possibility that the VM author was colorblind? |
|
Posted by: hatoncat - 02-04-2026, 08:59 AM - Forum: Theories & Solutions
- Replies (1)
|
 |
You are not allowed to view links. Register or Login to view.
"The situation is much more complex than just an overenthusiastic blue painter: - there is also too much white (blank)
- frequent, seemingly intentional combination of blue and blank elements
- there is too much yellow in the stems, a color notably lacking from the flowers
- calyxes in other manuscripts are green as a rule. While almost every large-plant page has green available to it (for the leaves), a disproportionate amount of other colors is preferred for calyxes."
After reading this post, I wondered if perhaps the blue flowers in the manuscript might have actually appeared green to the drawer. This might not have made a lot of sense, but according to a google search "what do color blind people see blue as":
"People with blue-yellow color blindness (You are not allowed to view links. Register or Login to view./You are not allowed to view links. Register or Login to view.) generally see blue as green, light blue as grey, or find that blues appear much darker and less vibrant. While rare, this type of deficiency specifically makes it difficult to differentiate blue from green and yellow from violet or pink."
Could this explain why they thought they were including pink when they were using a yellow pigment, or a green ink when it was actually blue?
|
|
|
| Voynich text features summary file |
|
Posted by: quimqu - 02-04-2026, 08:13 AM - Forum: Analysis of the text
- Replies (5)
|
 |
Hello all,
to all of us who are digging in the analysis of the MS text, I think it might be interesting to have a brief summary of the confirmed text features. That's why I have created this file: You are not allowed to view links. Register or Login to view.
It is open to be improved with your thoughts (proven facts) (in order to keep a proper view, it is not open to edit). I thik it may be a useful tool for keep on analysing the text and not forget already proven features (and not repeat too much testing).
Feel free to comment, add fetures (via reply in the thread) and join the discussion.
(By the way, my two lines in the "generative facts" page, if you think they are not confirmed or are uncertain, please tell me)
Thank you
|
|
|
| Cross-model notation structure in the herbal section - a statistical approach (prepr) |
|
Posted by: FamagustaTed - 01-04-2026, 07:00 PM - Forum: The Slop Bucket
- Replies (2)
|
 |
You are not allowed to view links. Register or Login to view.
Hi Everyone, I'm new here. I've been working on a structural analysis of the Voynich manuscript and wanted to share the results.
The paper takes a statistical approach to the herbal section, treating the script as a system to be characterised rather than a language to be identified. The core finding is that label morphemes encode plant-architecture features (stem type, root form, leaf shape, complexity) that can be tested against the illustrations - and that prose behaves differently depending on how much descriptive work the labels already do.
It's built on falsification throughout - every claim has a permutation test, several hypotheses were killed along the way (including my original language candidate), and there's an explicit claims ledger in the appendix showing what survived and what didn't.
What the paper does NOT claim: decipherment, a source language, or readings. The last line is literally "What it does not yield - and may never yield through structural analysis alone - is a reading."
Happy to take questions or pushback - that's what it's for.
Paper description
This paper presents a falsifiable structural model of the Voynich manuscript (Beinecke MS 408), based on computational analysis of the complete ZL IVTFF 2b transcription (36,234 tokens, 226 folios, 8 sections). Rather than attempting to identify an underlying natural language, the study asks what kind of system the manuscript implements, and answers through holdout-validated formal analysis, independent unsupervised confirmation, and cross-modal testing against the manuscript's illustrations.
The model establishes five principal findings. First, a four-layer morphological grammar classifies 91-97% of tokens across six stratified holdout blocks spanning five manuscript sections, three or more scribal hands, and both Currier languages, with zero stacking-order violations in any block and no parameter adjustment after model freeze. Second, the invariant formal system is deployed in at least six distinct compositional regimes -- loop-based prose, topic-dominant chaining, nominal labelling, weakened-loop variant, closure-weighted operational mode, and balanced connective mode -- varying systematically by section and hand. Two regimes were discovered only upon unsealing the sealed reserve holdout, demonstrating that the taxonomy expands under evaluation. Third, discourse-framing density in text predicts visual complexity of herbal illustrations (Spearman rho = 0.600, p < 0.0001, n = 43), confirmed by pre-registered holdout with minimal attenuation. At the label level, specific morphemes predict specific plant features across five independent visual channels, and morpheme bundles predict multi-feature plant profiles compositionally (LOO AUC p = 0.0006). Fourth, a 17-mapping codebook decodes plant architecture from herbal labels at 58.5% accuracy across 72 folios and is bidirectional: image features recover label morpheme sets above chance (p < 0.0001), with forward-greater-than-inverse asymmetry diagnostic of selective encoding rather than cipher. Labels and prose perform complementary, load-balanced functions confirmed by an adaptive compensation mechanism (rho = -0.337, p = 0.011). Fifth, the system meets 8 of 10 criteria for restricted technical notation while failing the criterion most diagnostic of natural language: lexical recoverability.
These findings are independently triangulated: a rule-based grammar, holdout replication across two evaluation stages, and unsupervised HMM recovery of grammar classes from suffix sequences alone (NMI = 0.181, entity purity 0.53) converge on the same structural conclusions. The architecture is inconsistent with simple cipher, random generation, hoax, or classical mnemonic systems.
The study also situates the manuscript within the documented manuscript ecology of the eastern Mediterranean, presenting quantitative visual comparisons against six comparator manuscript traditions. The herbal section aligns closely with early encyclopedic Qazwini copies (Euclidean distance 2.37), while the zodiac section occupies a distinct visual regime matching no tested tradition, combining Latin computational diagram architecture with Byzantine Greek medico-astrological content and a unique figurative encoding system.
The manuscript is best understood as a structured, sectionally differentiated technical system with partially recoverable semantics -- structurally technical but lexically local. Its grammar is real and invariant. Its regimes are real and section-specific. Its text and images interact. Its labels carry structured semantic content. What it does not yield through structural analysis alone is a reading.
This deposit includes the pre-submission draft (v5.0), analysis scripts, data files, and figures.
|
|
|
| The structure of the Voynich text and how it may be generated |
|
Posted by: quimqu - 01-04-2026, 12:16 PM - Forum: Analysis of the text
- Replies (19)
|
 |
Most discussions about the Voynich already agree on a few basic points: the text is not random, similar words tend to appear near each other, and position within the line matters. What is less clear is how this structure is actually generated.
I have been working in a pipeline that lets me analyze the structure of the MS. The goal of this analysis was not to describe patterns, but to test concrete generative hypotheses against the data. The question was always the same: if this were the real mechanism, would it reproduce what we observe? Running different models through the same pipeline makes it possible to discard entire classes of explanations, not just speculate about them.
| Hypothesis | Expected behavior | Observed failure |
| Random / weak structure | No stable local similarity or positional effects | Strong clustering and positional patterns persist |
| Sequential (Markov-like) | Next token predictable from previous ones | Bigram/HMM models add little or collapse |
| Copy–modify (parent-based) | Clear local derivations, strong nearest neighbor | Generative models produce too much similarity |
| Single dominant parent | One best local candidate per token | Multiple candidates with similar scores, no clear winner |
The important point is not just that these models fail, but how they fail. Copy-and-modify mechanisms generate too much similarity, producing tight chains of derived forms that are not observed in the real text. Sequential models fail in the opposite direction, missing most of the structure entirely. The idea of a single dominant parent breaks down because the local neighborhood is too ambiguous: for most tokens, several nearby forms are equally plausible, with no clear winner. These are structural mismatches, not minor errors, and they rule out a large class of simple generative explanations.
At the same time, some effects are very robust. Local similarity is real and strong: words share substrings and cluster in form space. Position within the line has a clear impact on length, prefixes, and suffixes. But these signals do not translate into a simple mechanism where one word determines the next. Token-level models struggle precisely because the system is not organized primarily as a chain of local decisions.
The structure becomes clearer when moving to the level of the full line. If lines are represented as whole objects, using their internal properties (number of tokens, length distributions, entropy, positional patterns), they fall into a small number of latent types. These types are not imposed manually, but learned directly from the text. They correspond broadly to different functional roles, but also reveal variation within them. These data-driven line types also show persistence across consecutive lines, suggesting that the manuscript is organized as sequences of line-level states, not just as a stream of loosely connected tokens.
- The text is not governed by simple sequential rules. Token-to-token models fail to capture the structure, even when extended beyond basic Markov assumptions.
- It is not generated by copy-and–modify or parent-based derivation. These mechanisms overproduce similarity and impose chains that are not present in the data.
- There is no single dominant local source for most tokens. The local neighborhood is too ambiguous, with multiple equally plausible candidates.
- The strongest and most stable structure appears at the level of the full line. Lines form a small number of latent types with distinct formal profiles and non-random sequencing.
A useful way to think about it is the following:
- The line type defines a space of possible forms.
- The local context restricts this space further by favoring forms that are compatible with nearby words.
- But within that constrained space, the final choice is weakly determined. Many candidates are acceptable, and no single one is strongly preferred.
This explains why local similarity is strong but does not translate into clear parent-child relationships, and why token-level models struggle while line-level structure is much more stable.
With this analysis I try not to show that the Voynich is structured, which was already suspected, but to narrow down the class of mechanisms that can plausibly generate it. Simple sequential models and naive copy-and-modify processes do not fit. Models that operate at the level of line-level states, combined with a local compatibility field and weak selection within that field, are much more consistent with the data.
|
|
|
| A thought |
|
Posted by: Tessa9 - 01-04-2026, 03:39 AM - Forum: Theories & Solutions
- Replies (3)
|
 |
This might be a dumb question, but does anyone have an answer?
I was just thinking, is Voynichese a character system or a letter system? Sorry if my terms aren't correct, but here's an example
In languages like Latin, English, German, etc., we have the alphabet (ABCD...), such that it creates letters, whereas in languages like Chinese, Japanese, Thai, etc., there is one character or more (你, 我, 对。。。) for one word.
Sorry if this doesn't make sense; I can elaborate, but if you understand and can give me an answer, please do. Thank you!
|
|
|
| Memory Palace Theory |
|
Posted by: Ace369 - 31-03-2026, 01:49 PM - Forum: The Slop Bucket
- Replies (2)
|
 |
Disclaimer that I used AI for grammar and calculate frequency of certain symbols so numbers might not be accurate, proposal is mostly just the idea.
Most attempts to decode the Voynich assume it's a language, encrypted, unknown, or fabricated. But what if that assumption is the problem?
Here's an alternative framing worth considering: the Voynich might be a correspondence notation system, a structured tool for mapping relationships between three domains of knowledge rather than a text meant to be read linearly. Not a book. A paper computer.
The intellectual context
Medieval and Renaissance natural philosophy was built on a tripartite model of reality, the Celestial, the Terrestrial, and the Human. Everything in one domain was believed to have a correspondent in the others. Specific plants corresponded to specific planets, which corresponded to specific body parts and humors. This wasn't metaphor, it was the operating model of reality for educated people of that era.
The three major sections of the Voynich map suspiciously cleanly onto this framework:- Herbal → Terrestrial (plants, material substances)
- Astronomical → Celestial (stars, cycles, time)
- Balneological → Human (body, fluids, vitality)
The manuscript wouldn't be three separate topics — it would be one unified system expressed through three lenses.
What the data suggests
Running statistical analysis on the IVTFF transliteration corpus produces some structural patterns that are hard to explain with cipher or natural language theory:
Labels behave like unique identifiers, not words
Across every section — pharmaceutical jars, astronomical stars, zodiac nymphs, herbal plants — label positions show vocabulary uniqueness ratios of 0.82–0.91. Paragraph text sits at 0.22–0.41. Labels aren't words being used repeatedly. They're names. Node identifiers in a system.
The qok- prefix behaves like a relational operator
Base words appear as labels. The same words with qok- prefix appear in paragraph text. And crucially — qok- is almost entirely absent from pure label fields across every section.
This is consistent with qok- functioning as a correspondence prefix — something like "of/in/belonging to" — turning an identifier into a coordinate. "keedy" names a node. "qokeedy" locates something in relation to it.
Locus type predicts text structure perfectly
Labels, paragraph text, circular text, radial text — each has its own statistical fingerprint. Word length, vocabulary density, daiin frequency, qok- density all vary systematically by structural position. That's not a cipher. Ciphers scramble content, they don't architect it.
daiin density tracks domain, not grammar
If daiin were a function word like "the" it would distribute evenly. Instead it's densest in Herbal A (10–14%) and nearly absent from zodiac labels (1–2%). It appears to mark a specific coordinate axis that some sections invoke heavily and others barely at all.
The picture that emerges
The manuscript might be structured as follows:- Labels = unique node identifiers (names for things in the system)
- qok- + word = correspondence coordinate ("this node relates to this domain")
- daiin = primary axis marker (appears where the main correspondence axis is being invoked)
- Section vocabulary = different domains of the same underlying relational system
- Dense text pages = possibly the index or query interface, how you navigate the system
Under this reading, the "language" resists decipherment because it was never a language. It's a notation system — personal to its author, built for navigation not reading, and only meaningful to someone who already had the underlying correspondence model internalized.
The symbols aren't words waiting to be translated. They're addresses in a system whose map was always meant to be carried in the mind.
This is speculative — the framework fits but hasn't been formally tested against competing hypotheses. Posting here to see if the structural evidence holds up to scrutiny or if there are obvious gaps in the reading.
What does this community think. Has the memory palace / correspondence notation angle been seriously explored before?
|
|
|
|