Introduction:
I am sharing a new discovery regarding the Rosettes Map (Folio 86v). My research shows that this map is not just a drawing; it is a geometric layout of the world and a warning about the end of times. The Discovery of "T" and "Code 4":
The T-Axis: If you look at the nine rosettes, you can see an invisible T-shape that connects them. The vertical line connects the top and bottom, and the horizontal line connects the sides. This "T" is the key to the whole map.
The Tilt (Karkata): Notice that the T-axis is not perfectly straight; it is tilted. In geometry, a tilted axis means the world is out of balance. This shows that power has shifted to one side.
Code 4: This "T" divides the circular world into 4 parts. These represent the 4 corners of the earth (North, South, East, and West). The "Code 4" shows how a central power (the fortified city) is trying to control all four corners of the world at once.
Fikra (Insight):
Everything in the Voynich Manuscript is visible if you look at the shapes. You don't need to read the letters to see the truth. The author had a vision of the future, showing a city with high towers that would dominate the world. The T-axis shows us the direction, the Code 4 shows us the global scale, and the Tilt shows us that the end-time sequence has begun. Conclusion:
This is a "Warning Map." The geometry doesn't lie. I invite you to look at the angles and the 4-part division to see the reality for yourselves. This is not just history; it is what is happening now.
If you haven't read my first post on this, here's the link:
You are not allowed to view links. Register or Login to view.
It'll explain a lot about what I'm going to present.
I don't believe God plays dice.
In the last post I had just split the Voynich into my 0ed (Currier A) and ed+ (Currier B) pages. This post is going to be less chart pretty and more statistics. Prepare to be bored mindless with numbers.
The Voynich
Total Pages
0ED pages: 104
ED+ pages: 121
Tokens per page (mean / median)
0ED: 84.6 / 80
ED+: 211.2 / 145
Unique tokens per page (mean / median)
0ED: 66.8 / 65
ED+: 140.4 / 110
Global hapax ratio (mean / median)
0ED: 0.146 / 0.142
ED+: 0.144 / 0.131
Reuse ratio (mean / median) - How often words repeat.
0ED: 0.725 / 0.755
ED+: 0.784 / 0.800
Variance of token length (mean / median)
0ED: 2.69 / 2.64
ED+: 2.77 / 2.68
Proportion of long tokens (length ≥ 7)
0ED: 0.145
ED+: 0.182
Unique bigrams per page (mean / median)
0ED: 60.5 / 61
ED+: 80.9 / 79
Bigram repetition rate
(1 − unique_bigrams / total_bigrams)
0ED: 0.791
ED+: 0.861
TLDR; ED+ pages
Are much longer (Baneo and Recipe influenced)
Introduce more total vocabulary
Reuse prior vocabulary more than 0ED pages
Contain longer tokens
Use more bigram types
Repeat bigrams more heavily.
Have roughly the same hapax creation rate.
Sometimes things have to break before you can fix them.
So, there are a lot of statistics that can show that ED+ pages are significantly different from ED0 pages, and much of this has been discovered and mulled over since Currier spotted that difference. But, what I'm about to show you will, I think, make you reconsider that difference.
I was working on splat repair. It occurred to me that some 2,000 splats exist in Takahashi. If the Voynich contained information, that was a lot of lost information. So, I started working on suite of repair tools, some bits stolen from OCR, some from spell checking. In that suite I did something different. I did 'leak free' testing. I would train with tokens on herbal and would then I would test on recipe. Or, I'd alternate folios. Train on one, test on the other. This allowed comparisons between two sections of the Voynich without one set of data contaminating the other. So when I spotted ed0 and ed+, the repair thing popped right up.
When I trained my model on 0ed pages and tested on ed+ pages, here's what I found.
OOV tokens:
(OOV = out of vocabulary)
10,389 - That's the number of tokens seen in ed+ pages that were not seen in ed0 pages. A fairly large number
Bigram-illegal tokens:
170 - That's how many tokens on ed+ pages that had bigrams that were not seen in the 0ed pages.
Repairability of OOV tokens:
SUB or DEL (token-level): 82.15%
SUB or DEL (type-level): 66.49%
SUB/DEL/INS (token-level): 82.69%
SUB/DEL/INS (type-level): 67.33%
This, is the big one. Of those 10,389 tokens that were found on the ed+ pages that didn't exist on ed0 pages, 82% could be made an ed0 token with one simple character deletion or substitution. These two sets of pages are not that different.
Next, I reversed that test. I trained on ed+ pages and tested on ed0 pages.
OOV tokens (Vocab-OOV):
1,714 - That's a huge difference. Over 10,000 tokens were seen on ed+ pages that didn't exist on ed0, but only 1,714 tokens were on ed0 that were not on ed+
Bigram-illegal tokens:
17 - Only 17 tokens in ed0 had bigrams that were not on ed+.
When you compare those numbers based on total tokens in the test,
0ED → ED+
OOV tokens: 10,389
Test tokens: 25,554
OOV ratio: 40.66%
ED+ → 0ED
OOV tokens: 1,714
Test tokens: 8,797
OOV ratio: 19.48%
And, 82.21% of ed+ tokens could be repaired to make ed0 tokens with a single substitution or deletion.
More notes:
High-frequency backbone
In 0ED
Top 10 most frequent tokens → 100% shared with ED+
Top 20 → 100% shared
Top 50 → 100% shared
In ED+:
Top 10 → 70% shared
Top 20 → 75% shared
Top 50 → 78% shared
Exclusive bigrams
0ED-only bigrams: 12
ED+-only bigrams: 60
TLDR2;
0ED vocabulary is largely contained within ED+.
ED+ expands well beyond 0ED.
Every high-frequency backbone token in 0ED exists in ED+.
195 of 207 0ED bigrams survive in ED+
That is 94% retention
ED+ expands the bigram alphabet
Adds 60 new bigrams
Bigram space grows from 207 → 255
And here's the first chart. This is the vocabulary growth between 0ED and ED+. This is a very smooth growth rate. This suggests that there was no big shift between 0ED and ED+. Despite all of those differences above, it's still the same base "engine" chugging along with no dramatic change.
Sometimes, broken things deserve to be repaired.
In ED+ there are 4,260 unique tokens that do not exist in 0ED pages (different from the OOV above). If we take those tokens and we do a simple 1 edit distance repair to a token in 0ED:
2,870 are 1 edit distance away.
1,123 are edit distance 2 away.
3: 218 are edit distance 3
>3: 49
I gotta bold this to make sure it's seen
Around 94% of ED+-only vocabulary is within edit distance ≤2 of 0ED.
Let me put that another way.
Out of 4,260 uniquie tokens in ED+ pages
3993 can be made a 0ED token by changing 2 characters.
I had to keep repairing things...
I set up a chain, where I would take all of those that were edit distance 1 from a 0ED token and I compared all of those that were edit distance >= 2
By repeating this chain of checking and rechecking edit distance I came to an abrupt stop at get 6
Gen 1: 2,337
Gen 2: 630
Gen 3: 160
Gen 4: 30
Gen 5: 9
Gen 6: 1
1,093 tokens were still unreachable. So, I relaxed the rules a bit. I started allowing 2 edit distance.
And then allowed edit distance 3.
After 2 rounds of editing like this, I was left with 15 tokens that could not be chained back to 0ED. And every single one of those tokens was length >8. I then considered those to be possibly transcription errors and were conjugations. I compared them to shorter tokens and was able to split them into 2 words. All of those were then 1 or 2 edit distance from a 0ED token or a previously repaired token.
Ok, so every single token on ED+ could be chained into an edit distance of 3 or less into a 0ED token.
I checked Zandberg/Landini.
99.07% of the ED+-only vocabulary in was absorbed within ≤3 edits.
I had 45 tokens left over. 45 / 45 (100%) have a split where both halves were within ≤3 edits of an another checked token.
Ok,... that can't possibly be right. It means I can edit any word a few times... ok 6 at most, maybe 7, and I can make every single word from one half a book match the other half.
Voynich words have a greater similarity than Latin or English. Now to old hands at Voynich, this is no huge surprise. But, it does show that after editing what appears to be two very different sections of the book, that similarity is just a few edit distances away.
Conclusion.
I'm likely going to get beat up over this but, here goes:
Currier Language A and B are not distinct languages and he noted that.
0ED pages were likely created prior to the ED+ pages. I said likely! I don't have solid proof but the difference in vocabulary and bigrams suggests it.
0ED and ED+ are not behaving like normal text. Well, the whole Voynich doesn't behave like normal text so no surprise there.
0ED and ED+ look like two regimes, but not two vocabularies:. ED+ is almost entirely built from 0ED by tiny edits. The same "engine", different settings.
So, I hope I've given enough evidence to show how these two regimes are different, but the same underlying system. I'll be interested in hearing your thoughts.
The big question now is:
Why does a lexical "engine" make a drastic switch like "ed" if the vocabulary isn’t actually changing much?
I think I can answer that. But that's for another post.
Disclaimer: I have tried to review all of these numbers and I believe them to be reasonably accurate. I may have missed some but hopefully nothing drastic.
Functional Resolution: The Reactive Geometric Labeling Model (RGLM) and the Entropy Anomaly
Hi everyone,
I have been working on a functional approach to the MS 408 text-image relationship, and I am excited to share a formal model that provides a reproducible explanation for the manuscript's low entropy.
Instead of looking for a natural language or a complex cipher, my research focuses on Reactive Geometric Labeling (RGLM / MEGR in Spanish). The core thesis is that the "labels" and text blocks are isomorphic to the visual morphology, density, and spatial distribution of the illustrations.
Key findings of the model:
• Isomorphic Determinism: The word length and prefix/suffix distribution (like the D/C and O/SH families) correlate directly with the geometric complexity of the drawing.
• Entropy Resolution: The low entropy isn't a linguistic feature but a functional one; the "vocabulary" is constrained by the recurring visual patterns it labels.
• Predictability: The model allows us to predict certain lexical clusters based on the specific arrangement of botanical or pharmaceutical elements in the folios.
I have registered the full methodology and the preliminary report on Zenodo to ensure open access and peer review.
Full Paper / DOI: You are not allowed to view links. Register or Login to view.
I would love to hear your thoughts, especially from those focused on computational linguistics and pattern recognition. I am open to testing the model against specific folios suggested by the community.
Best regards, Emmanuel Jiménez Independent Researcher
"Figure 1: Application of the Reactive Geometric Labeling Model (RGLM) on folio 2r. Note the correlation between visual complexity and specific lexical clusters."
New member here. I want to be upfront about a few things before presenting what I've been working on, because I know this community has seen a lot of "I've cracked it" posts, and I don't want to waste your time.
What this is not:
- A decipherment
- A translation
- A claim that I know what language the VM is written in
What this is:
- A morphological model that decomposes ~30% of the VM vocabulary cleanly using a prefix-root-suffix system
- A hypothesis (the "corrupted copy" hypothesis) that explains why the remaining 70% resists decomposition
- A set of testable structural predictions, some of which I've checked against known data and which seem to hold
How it started:
This came out of a completely unrelated discussion about the Phaistos Disc, where I was exploring structural-functional approaches to undeciphered texts with an AI language model (Claude). On a whim, I tried applying the same approach to the VM — building a synthetic grammar from scratch based purely on internal structure, with no prior assumption about what language family it belongs to. I expected it to fail. It didn't fail as completely as I expected.
The core idea in brief:
The VM text behaves like an agglutinative language with stable prefixes (qo-, ch-/che-, sh-, da-, ol-/o-), productive roots (ke-, te-, ka-), and grammatically meaningful suffixes (-dy, -y, -in, -ain). These decompose the highest-frequency words cleanly: qokedy, qokeedy, qokeey, chedy, shedy, daiin, dain, and their families.
The model makes specific positional predictions that appear to hold:
- Words ending in -in/-ain avoid line-final position (they mark continuation)
- Words ending in -y can close lines (terminal)
- da- words (daiin, dair) are strongly line-initial (>20% of occurrences)
I then tested this against the full You are not allowed to view links. Register or Login to view. transcription (Takahashi version from the Stolfi interlinear) line by line. Results: roughly 30% clean decomposition, 38% partial, 32% fail. The failures concentrate on gallows characters, which the model doesn't address at all — that's the biggest gap.
The "corrupted copy" hypothesis:
The reason I think the remaining 70% is noisy rather than wrong: the VM may not be an original composition. If a 15th-century European scribe copied an older text in a language they couldn't read — character by character, purely as visual patterns — you'd expect exactly the pattern we see: high-frequency morphemes preserved (because the scribe's hand learned them as motor patterns), interior morpheme boundaries smeared, rare forms absorbed into common ones, and vowel distinctions (the e/ee/eee system) rendered inconsistently.
This would also explain why the VM has natural-language statistics but resists decipherment: the source was a natural language, but the copying process added a layer of systematic noise.
Typological direction (most speculative part):
The morphological profile — agglutinative, prefix-based, ergative-looking alignment (da- as possible ergative marker), SOV-compatible word order, r/l alternation in the ol/or/al/ar system — doesn't match any European language. It does align typologically with Hurrian, Urartian, and Northeast Caucasian languages. I'm not claiming the VM is in Hurrian. I'm saying the type of grammar matches that corridor better than anything in Europe. The Diakonoff-Starostin "Alarodian" connection means these structural features are shared across multiple families in the region.
What I'm looking for from this community:
1. Has anyone tested positional constraints of suffixes systematically? The -in/-ain line-avoidance and da- line-initial preference are the model's strongest testable predictions. If these are already known/published, I'd like to know.
2. Gallows integration. The model completely ignores gallows characters (~15-20% of the text). If anyone has ideas about how cth/ckh/cph/cfh might fit into an agglutinative prefix system, I'm very interested.
3. Currier A vs B. The model currently treats the text as uniform. If A and B have different morphological profiles, that's important — it could mean different source texts, different scribal hands, or dialectal variation.
4. Where am I reinventing the wheel? I'm new to VM research specifically. If someone has already proposed an agglutinative model, or tested prefix/suffix positional behavior, or explored Caucasian typological parallels, please point me to their work. I'd rather build on existing research than duplicate it.
5. Where am I obviously wrong? I can take it. That's why I'm here.
Full paper attached. It includes the complete morpheme inventory, decomposition test results with line-by-line You are not allowed to view links. Register or Login to view. analysis, the corrupted copy argument, typological comparison with Hurrian/NEC languages, historical transmission scenarios, and full references.
Transparency note: This was developed collaboratively with Claude (Anthropic's AI). The hypothesis and direction are mine; the systematic testing, frequency analysis, and typological comparison were done with AI assistance. The AI was also used to stress-test the model — the initial assessment was actually quite harsh, identifying major gaps (gallows, semantic unfalsifiability, cherry-picked examples) before we refined the framework. I mention this because I think the methodology is legitimate and worth being honest about.
Thanks for reading. Looking forward to being told why I'm wrong.
I am new, so please forgive my lack of completely understanding the current state of the research.
I was wondering what the current status is of the hypothesis that Voynichese is a delta cipher, in other words, the idea that it is the transition from one word to the next that encodes information -- which characters are dropped and added and where, etc. This could encode plaintext letters or the numbers of an intermediate Polybius Square. I know papers like those of Timm and Schinner looked at word similarity in transitions, but I couldn't tell the degree to which this sort of encoding was ruled out. I also would have expected modern professional cryptography to have cracked it by now if this were the case, but perhaps I have too much faith in that.
Ok, so I have this paper I've been working on and I have a very rough draft on Zenodo. I've decided to put the things I've been digging into on ninja with the hopes that additional pairs of eyes will clue me in to things I've been missing before I make a complete fool of myself and submit it for peer review. For all of these tests, I've used the EVA Takahashi (I'm old and have used it for years) with cross verification of the EVA Zandberg/Landini.
I'm goin to try to break all of this down into multiple posts because I have a bunch of territory to cover. Each will refer back to previous ones. Much of what I'll cover won't be new territory to the old hands at the Voynich. Some, may be.
The bigram "ed"
It's been known for many years that the bigram "ed" is just plain odd. It occurs in the Voynich as a midfix 4,474 times and as a suffix 186 times. Never as a prefix. That may not sound that striking but this chart shows just how striking it is.
That is "ed" compared to the top 100 bigrams by total count and percentage of pages. It occurs on roughly 56% of pages but is in the top 10 for total bigram count (#9).
Currier and "ed"
Currier noticed a difference when he described his language A and language B. He could never quite put his finger on all of the differences between the two. I'll suggest that the big thing he noticed was the bigram "ed".
This chart shows the locations of the bigram "ed" with the background shaded to represent Currier A and Currier B. The dot colors represent blue = no ed bigrams on the page, orange = 1 ed bigram on the page and green 2+ bigrams on the page.
Side note: You'll notice 2 orange dots early in the herbal section You are not allowed to view links. Register or Login to view. and f11r. In both of those pages, ed occurs once and it's inside a hapax token. The total number of pages where ed only occurs once is 19. Of those 19 pages, it's a hapax token on 6.
So, just from comparing Currier to ed, we see there's a very close match. He apparently never fully defined the zodiac section as either so it has the white background.
"ed" by section
The first thing I noticed was, the first 25 folios only have those 2 occurrences of ed. That seemed pretty odd for a bigram that's one of the top 10 by count. So, I decided to dig further.
"
This chart shows the bigram ed by section. I lumped the ed's into buckets. No ed on the page, 1 ed per page and a low, medium and high bucket that split the ed per-folio count into 3 groups of around 40 pages each. This chart is also normalized by folio word count to show the differences even better than the previous chart. On the left, you'll notice again, the first 25 folios, only the two hapax token ed occurrences. At f26r, ed gets introduced. But not all at once. It skips around between pages with ed and no ed. The pharma section does the same thing. Some have ed, some do not. The same for zodiac. About half either have no ed or 1 ed. Baneo, rosette and recipes all have the highest count and ratio of ed in the entire Voynich.
"ed" by sheet?
Now here's where things get a big strange. I'm not going to interject my theory here. I'm going to be really interested in hearing yours.
I downloaded the quire diagram from Voynich.nu and converted it into a csv that I could import into my python. I then changed the background color to match the quire sheet number. With one exception, 27v, you will notice that all of the pages where the bigram ed is the highest in the herbal section, they're all on the same sheet. But, they're intermixed with sheets that contain no ed.
F26 and F31 are on sheet 2
F33 and F41 are on sheet 1
F34 and F40 are on sheet 2 F41 and F48 are on sheet 1 F43 and F46 are on sheet 3 F50 and F55 are on sheet 1
You can also see a similar pattern in pharma. All have a relatively low ed count with those in the middle having a higher normalized count. Those appear on sheets marked as sheet 1. Again, no theory, but if the Voynich is in some semblance of a chronological order, this, combined with the no ed pages in other sections made me seriously scratch my head.
Which came first?
One thing Dr. Davis has mentioned in some of her talks is that she believes the folios are not in original order (I can't wait to see the results of that!). And looking at these charts, it struck me as interesting that the ed bigram appears to be in clusters and groups. Not so much by region as by quire sheet. Since we truly have no idea what order this book was written in, I developed a theory. Assume that all of the pages where ed never occurred or was in a hapax token where created first and that the bigram ed was brought into prominence later (or the reverse of that). What kind of differences would they have? So, I spit the Voynich into 2 "halves". The 0ed half, which included pages where it never occurred or it occurred once in hapax token, and the ed+ "half" where it occurred at least once and was not in a hapax token.
Here's a csv list of the pages I identified and began classifying as 0ed and ed+ pages.
So, this is how I entered the rabbit hole. There's a bit to digest here when you consider the implications so I'll end the post here. But, there's also lots more to pile on top of this so I'll be referring back to this post. I'll be sure to link it when I continue this in a new thread in the near future™.
Thanks for looking it over and I'm eager to hear opinions.
I wanted to open a discussion on a specific possibility that seems to be gaining traction with the recent studies coming out.We have all seen the discussion around Greshko's "Naibbe Cipher" and how it generates text that statistically resembles the Voynich. You are not allowed to view links. Register or Login to view.
Then, Pincar’s model identifies a very specific dependency in the text. Essentially, he demonstrates that the ciphertext depends not just on the current symbol, but on the previous one as well meaning the system has "memory" or context. And let's be clear, he himself highlights the following in his article: “This model identifies the structure, but not the content. We cannot determine: the identity of the source language, the semantic meaning of any word, or whether the manuscript contains meaningful information.” You are not allowed to view links. Register or Login to view.
If we start from that basis, that there is a mechanical dependency between the previous state and the current one, could we be looking at the text result of a three part volvelle or cipher disk?
A device with concentric rings (Outer, Middle, Inner) would naturally force the rigid Prefix + Root + Suffix word structure that we see throughout the manuscript. Something like this: You are not allowed to view links. Register or Login to view.
I am curious to hear your thoughts. At first glance, what jars you about this idea? Does a mechanical "wheel" explanation fail to account for any specific linguistic features you've noticed?
I would like to learn about astrology so I can have a better understanding of the Zodiac pages in the Voynich Manuscript. I know Zodiac "Sign" is based on date of birth but I am clueless about Zodiac "Houses" and Zodiac "Stations" and a lot of other terms. Can anyone point me to a good (free) online source to learn astrology/zodiac that is based on historical beliefs and not mysticism leaning?
I know the Zodiac "Sign" is based on date of birth, where the sun is at the time of birth. Gives temperament, motivations, strengths, etc.
I believe Zodiac "House" adds time of day and place of birth. I'm still confused about the 12 "Houses". No idea about Zodiac "Stations". How were "Houses" and "Stations" generally illustrated in the early 15th century?
Each of the circles of Quire 9, f67r1 and f67r2 has 12 sections. I wonder if these could be the 12 Zodiac "Houses". I don't know why they would put "Houses" in Quire 9 before the "Signs".
You are not allowed to view links. Register or Login to view.
Or, maybe the "baskets" the nymphs are in on the regular Zodiac "Sign" pages are Zodiac "Houses" but they don't line up to 12. Not all Zodiac nymphs are in "baskets". Nymph in a "basket" or not in a "basket" must have some meaning.
Unfortunately I couldn't just propose this in the original thread (now in the AI Slop Jar). In You are not allowed to view links. Register or Login to view. Joshwaful said:
Quote:I think there's a language barrier between us.
shol" is frequent because it is a Phonetic Homonym
Latin: Sol- (Sol-u-tio)
German: Schal- (Schal-e)
Czech: Sol- (Sol-i)
The Scribe's Brain: He hears the sound /sol/ in his head. Whether he means "Salt" or "Basin," he writes the same phonetic cluster. It's the same as todays doctors.
I hereby propose that every time we're subjected to a theory like this because the creators of EVA thought making Voynichese pronounceable was important Rene should have to put a Euro in the EVA phonetics based theory jar. Every time the jar gets full enough we'll have a pizza party or donate the money to Doctors Without Borders or something...
To me, it kind of looks like the official insignia of the You are not allowed to view links. Register or Login to view. (especially the legs and orientation, but without the tail):
This is interesting because the classical Voynich crown, i.e.,
is a match for that of Queen You are not allowed to view links. Register or Login to view.,