One Hand, Five Labels: A Critical Examination of the Five-Scribe Hypothesis for the Voynich Manuscript
You are not allowed to view links. Register or Login to view. by Torsten Timm
Quote:This paper presents a critical examination of the five-scribe hypothesis proposed for the Voynich Manuscript. The analysis suggests that the supposed distinctions between scribes may instead reflect a continuous development of a single evolving hand. It offers detailed observations on glyph forms, patterns of handwriting variation, and their broader implications for interpreting the manuscript’s text.
The x 'picnic table' glyph is similar to the You are not allowed to view links. Register or Login to view..
Does this point to the possible inclusion of You are not allowed to view links. Register or Login to view. in the cipher manuscript?
GERSHA v9.2 — Functional Model of the Voynich Manuscript with External Control and Structural Analogues
I've been working on a corpus-based structural analysis of the Voynich manuscript for the past year. Before posting I want to be clear about what this is and what it isn't. What this is not: a decipherment. I make no claims about the language, the author, or the meaning of individual tokens. What this is: a reproducible functional model of the manuscript's architecture, verified on the full ZL v3b corpus (202 folios, 35,049 tokens), with an external control corpus and a decoded structural analogue.
The core claim (Level 1 — fully reproducible):
The manuscript is structured as a production cycle + influence calendar. The {P}-formant { -edy | -eedy | -ody | -eody } behaves as a process marker: it rises sharply in Balneo (23.2%), falls in Pharma (7.7%), and correlates with figures acting as agents of active processes rather than catalogue markers.
Three findings are confirmed, one was refuted and corrected in v9.2:
— Balneo TTR 0.424–0.475 vs Macer Floridus baseline 0.715+ (structural chasm, confirmed independently of counting method)
— Production chain H→B→P confirmed on full corpus
— Zandbergen cross-references H↔P: +14.4 pp -ol on 8/8 paired roots
— {P}≈0% for Astro was refuted: Astro {P}=10.3%, TTR=0.744. Corrected in v9.2. External control: Linear B (DAMOS)
The second document uses Linear B tablets from DAMOS (University of Oslo) as a decoded structural reference. Seven tablets across six series reveal a six-type document typology based on a single key finding: to-so marks a closed, totaled list regardless of series. Its absence is functional, not accidental.
This resolves a structural question about both Indus Script Type B inscriptions and the Voynich Zodiac section — both now have deciphered analogues in Linear B Series V (divine register, presence markers rather than quantities).
Voynich Balneo does not fit the accounting family (TTR 0.65–0.85, fixed positional architecture). It fits nothing in our Linear B sample. The process/instructional interpretation remains the only model consistent with all three comparisons simultaneously. Known limitations (honest):
— HMM transition matrix built on f75r/f78r only — full ZL v3b verification pending
— Geometric binding of {P} to illustrations not confirmed via line-position metric (r = −0.076)
— All cross-system parallels are Level 2 (structural hypotheses, not semantic claims)
— Everything depends on EVA as a correct morphological boundary Documents attached:
— GERSHA Protocol v9.2 (full corpus analysis, tables, verification status)
— GERSHA Comparative Framework v1.0 (Linear B structural control, 7 tablets, 6-type typology)
I'm genuinely interested in criticism of the method — particularly the EVA dependency and the Astro reclassification. Happy to share raw numbers. Grisha H.G. | February 2026
I've been working on something... And yes AI has helped a lot, but the alternative was a team of PhDs which I don't have..
It's empathically not a decipherment and it's not an origin claim.
So forgive me if I leave these here for a few days while I write everything up and test it a few more different ways, including with some actual computational linguists. I wouldn't be writing this if I didn't think it held..
The Voynich Manuscript (MS 408) is written in a polyphonic constructed syllabary, a deliberately designed, internally consistent cipher encoding medieval Italian botanical and medical knowledge within a Galenic framework. The full transcript is done. Here is where to start, if you want to keep the mystery and fun going.
1. 18 confirmed plant identifications (folios 1v-10r) with direct EVA → Italian decodings validated against botanical illustrations and Galenic medical properties
2. 20 predicted additional plants (folios 10v-21r) using the same syllabary system, character-by-character decodings, and medical coherence constraints
3. Phonological class analysis showing the author systematized Italian phonemes into 7 consonantal classes + 1 vowel class, assigning one primary EVA character to each
4. Galenic template validation proving 90%+ adherence to the standard medical template (quality → description → indication → preparation → dosage)
5. Cross-section vocabulary bridges confirming that plant names, quality words, and medical terms appear consistently across all sections (botanical, pharmaceutical, astronomical, biological)
Translation, Line 1:
Viola, cold and moist.
The leaves of Viola are round and tender, with veins. The flowers are purple, growing singly on long stems. The root is small and fibrous, white inside.
It is good for all hot fevers and inflammations of the chest. It cools the blood and reduces hot bile. It is useful also for burns and hot wounds.
Prepare by infusing the flowers in water or making a syrup with honey. Give one spoonful in the morning and evening. Or make a poultice of the leaves for wounds.
Galenic Verification: Viola odorata documented in Dioscorides, Galen, Savonarola as Cold 1°, Wet 1°; primary use for fever, chest inflammation, and wounds.
Hi all, I'm incredibly excited to share the findings of my research, and present a tool for the community to try out the method for yourselves!
In a nutshell:
We have created a method which uses data compression to mathematically map the most statistically optimal, repeating building blocks within a closed text.
Constellation Analogy:
Imagine you've been asked to draw a shape with the stars in the nights sky, without being told what the shape was. 100's of different shapes might seem to fit perfectly, making it impossible to know which pattern is real just from looking at it.
Our method ignores all of those imaginary shapes. Instead it measures gravity to prove exactly which stars are linked together in the same solar system. By mathematically mapping the true structural clusters first, no one would waste time trying to connect stars that are actually millions of miles apart.
In the context of the manuscript, those millions of stars are the raw characters of the EVA transcription. It's incredibly easy to group the wrong letters together because you might think they look familiar to a word in say Latin or Hebrew.
So instead of guessing, our engine uses a math rule called Minimum Description Length (MDL) as our "gravity". We can mathematically measure exactly which letters are structurally bound together to form highly stable candidate morphs (structural prefixes, word-cores, and suffixes).
It's not translation, you translators still have the hard job of deciphering the meaning but at least now you can test your translation theories based on mathematically verified structural boundaries, rather than waste time on statistically random clusters of letters.
How does this differ from existing MDL/Entropy techniques (simplified): If you're not familiar with MDL, or want to skip some of the more technical details, feel free to scroll past this and the next section.
Traditional entropy tools like standard n-gram analysis treat text as fixed blocks of data. They identify a character/token by how often it appears, or use a cryptographic hash. Basically, they count things.
Our proprietary engine (patent pending) maps data into a 2D bit-array. We identify a token by its geometric centroid (its centre of gravity), not just as a list of characters. This allows us to recognise patterns even if the transcription varies, so long as the geometric shape is stable.
A classic example of this is "csheedy" and "sheedy".
Try this for yourself with the Word Shape Visualiser tool here: You are not allowed to view links. Register or Login to view.
MDL Comparison Continued:
For those who might have more questions on this part, here are a couple of additional comparisons:
1. Traditional methods use fixed-window chunking e.g. 8-bit or 16-bit. This runs the risk of cutting words in the middle. Our engine uses a unique "optimal symbol length" calculation using the data itself via Shannon entropy minimisation. This makes segmentations data-driven instead of hard-coded assumptions.
2. Traditional compression tools are often lossy or purely mathematical. This makes inspecting or replaying the data in its original form difficult. Our engine is a reversible ledger and returns the exact original bits. This means the patterns found are actually representing the original data, not just mathematical abstractions.
3. Traditional statistical tools will find patterns in random noise since random data naturally produce frequency spikes. Our engine runs the data it ingests against null controls such as Shuffle, Markov and Random samples before it considers a structure to be statistically significant. It means if a pattern appears in the text but also the same frequency in randomised text, it identifies it as noise, not structure.
How to mathematically verify what the engine finds is valid
It's important that anyone can verify the method otherwise you'd have no reason to take anything the engine gives you as "truth". We verify this process through a couple of important, well-known mathematical principles/laws.
I've written a guide that you can follow along to conduct your own math experiment, using some EVA text of your own choosing, and you can verify your own results! Head over to the guide here: You are not allowed to view links. Register or Login to view.
If you can't be bothered going through the guide, here is the bottom line:
We mathematically prove that text is being compressed which is a mathematical “trick” that only works if you have found real, repeating patterns.
You cannot compress random data. If every piece of data is unique, you cannot describe it using fewer letters without losing information. Finding these patterns reveals a hidden “lego set” used to build the words. Whilst we can only guess where the lego bricks click together, the maths proves it: they only click together when it makes the dictionary smaller, not larger. The maths tests every single letter, so if we cut just one letter to the left or right, the file size would increase, not decrease.
If the engine shows that sh always connects to edy to maximise compression, that is no longer a statistical coincidence, it is a definitive, physical property of the manuscripts dataset.
This engine is helpful because humans tend to see patterns where they do not exist. We can see a face in the clouds, but maths has no brain and cannot be fooled. In short: if the maths can make the dictionary smaller, the pattern is physically real.
USE THE ENGINE YOURSELF
Now for the fun part, we have put together a few different tools to help enthusiasts test their translation theories in a few different ways:
Link to the tool: You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view. Find repeating patterns at character, prefix, word and phrase levels.
1. Paste in an EVA transcription. You can use the Quick Load buttons to use a sample Folio.
2. Change the Lens based on whether you want to focus on individual characters, structural-prefixes, repeating word-cores etc.
3. Click Analyse and see the patterns found in that EVA text. If there is a particular root or phrase you're interested in, you can click Find Context which takes you to the next available tool:
You are not allowed to view links. Register or Login to view. Investigate grammatical roles without guessing the meaning.
1. Paste in a pattern you are interested in and again provide the Corpus you wish to scan it against. If you used the Find Context button from the Pattern Decoder, it will have done this and performed the analysis for you automatically.
2. It will show you which words contain your pattern, as well as the words that come before and after it (and their frequency).
You are not allowed to view links. Register or Login to view.: Compare the 2D bit structure of EVA words to see how word prefixes and roots differ in their geometric representation.
1. Enter in 2 EVA words. By default the Lens is set to candidate syllables, prefixes and word-cores.
2. Try using "csheedy" and "sheedy" as a test to see how they share a common structural root.
You are not allowed to view links. Register or Login to view.: Compare a highly compressible EVA block with what you think the English translation is to test your theory.
1. Provide your EVA text and English word, try "ol" and "the" as an example.
2. Enter in a Corpus (Folio) to compare them against and Run the Audit.
3. The engine will replace the EVA word with the English word against the Corpus you selected. It will then evaluate it according to its grammatical position, entropy impact and morphological family coherence.
4. It will give you a score for you to review. There are various guides on the page to help you interpret the results.
SUMMARY
I would love to get the communities feedback. I'm happy to share more information, provide more details on any particular area, collaborate, help test theories, provide more tools and more. There is an FAQ section which you can also view here: You are not allowed to view links. Register or Login to view.s
I. Core Hypothesis The Voynich manuscript is a 15th-century European copy of a 9th–12th century Andalusian/Islamic field guide. The original text documented the flora, astronomy, geography, and medical practices of the Americas (Mesoamerica/Mississippian cultures) using unpointed Arabic shorthand (Rasm). The manuscript's bizarre cipher is the direct result of a European scribe erroneously copying a Right-To-Left (RTL) Arabic text from Left-To-Right (LTR). II. The Historical Vector (Transatlantic Contact)
The Explorer: The geographical knowledge originates from early Islamic transatlantic navigation, mirroring the documented 889 AD voyage of the Andalusian navigator Khashkhash ibn Saeed ibn Aswad from Cordoba, who crossed the Atlantic and returned with strange botanical specimens.
The Documentary Evidence (Al-Masudi): Khashkhash's transatlantic voyage was formally recorded by the renowned 10th-century Arab historian Al-Masudi in his 947 AD encyclopedia, Muruj adh-Dhahab (The Meadows of Gold).
The Botanical Proof: Folio 93r features an anatomically accurate drawing of a wild sunflower (Helianthus annuus), a plant strictly native to the Americas, definitively proving transatlantic access to New World flora centuries before Columbus.
III. The Geographical & Cosmological Proof
The Rosettes Map (The City on Water): Folio 86v contains a massive fold-out map of a central circular city surrounded by water, heavily segmented and connected by stone causeways. This structurally mirrors early maps of the Aztec capital, Tenochtitlan, or a major Mississippian coastal trade hub, rather than any known European city.
The Pleiades & The New Fire: Folio 68r maps a central celestial body surrounded by the Pleiades star cluster. In Mesoamerican culture, the zenith of the Pleiades (Tianquiztli) was the survival-critical anchor for their 52-year calendar cycle and the "New Fire" ceremony.
Equatorial Navigation: Later cosmological fold-outs depict swirling, amorphous star clusters perfectly matching the Magellanic Clouds, visible only when navigating deep southern or equatorial transatlantic trade winds.
IV. The Linguistic Mechanism (The LTR Copyist Error)
The Reversal: The Voynich suffix syllables (like
-al
and
-ar
) are actually the Arabic definite article prefix "Al-" (ال) copied backward by the European scribe.
The Benches & Tails: The Voynich alphabet is a visual tracing of dotless Andalusi Arabic. Sweeping Voynich tails map to Arabic terminal letters (Nun, Ra, Ya), while Voynich "benches" map to unpointed medial teeth (Ba, Ta, Tha).
The Gallows Characters: The massive Voynich gallows letters occurring at paragraph starts are misinterpretations of the towering vertical stalks of the Arabic Lam-Alif ligatures and the word "Allah" (الله) from the Bismillah invocation.
V. The Pharmaceutical & Industrial Evidence
The Apothecary Jars (Trade Pottery): Unlike standard 15th-century European glassware, the heavily painted jars in the Pharmaceutical section visually match the geometric, bulbous styles of Mesoamerican or Mississippian trade pottery. The scribe faithfully copied the exotic containers drawn by the explorer.
The Andalusian Integration (Ibn Juljul): Exotic medicines brought back by explorers were processed by elite Andalusian pharmacists in Cordoba, such as the master botanist Ibn Juljul, who documented how to extract chemical properties from newly discovered foreign plants.
Reverse-Translated Terminology: When reading Voynich labels backward through the Rasm mapping, exact medieval apothecary terms emerge. The jar label
chol
reverses to the Arabic root L-W-Q (لعوق / La'uq - medicinal syrup), and the plumbing label
olad
reverses to D-L-W (دلو / Dalw - water bucket).
Tawkid Lafdhi (Verbal Emphasis): The Voynich word reduplication (e.g.,
chol chol
) perfectly mirrors classical Arabic grammatical repetition used in Islamic medical formularies to stress exact measurements.
VI. The Scribal Europeanization
Because the 15th-century Italian or German scribe did not understand the indigenous subjects in the master codex, they updated the illustrations with European elements: placing Italian Ghibelline swallowtail merlons on the walls of the Tenochtitlan map, drawing early 15th-century crossbows, and giving European medieval hairstyles to the bathing nymphs.
Hi everyone, I'm excited to be here. Let me be upfront, I have used AI to help me write this post as I want to ensure that what I've done is clearly communicated and understood here. But please do not just assume it's AI slop, I've worked hard on this and I am happy to share the extensive, rigorous testing that's gone into getting to this point.
I am introducing a new computational tool to the community. I want to be clear upfront: this is not a translation attempt. Instead, it is a way to mathematically verify the structural patterns we often talk about in EVA, making those claims objective and reproducible.
The Problem with leaping to translation Translation attempts usually fail when they pick meanings too early. If you try to map EVA strings to a real language, you need hard constraints first.
Think of deciphering an unknown language like trying to build a jigsaw puzzle where all the pieces are blank. Normally, cryptographers have to guess the shape of the pieces. Where does a prefix end? What is the basic sentence structure? Is this a compound word? This tool maps the exact shapes of those puzzle pieces so that when you do try to assign meaning, you have a mathematical rulebook you must follow.
How the tool works I have built a structural analyser that treats the Takahashi EVA transcription strictly as raw data. It does not use dictionaries, semantic guessing, or AI models. It is a deterministic mathematical engine. It scans the raw binary data of the file through different structural lenses to find exact, unbroken repeating sequences.
A practical example: The Balneological Section To show you what I mean, let us look at a famously repetitive block from the balneological section.
I ran the following EVA text through the engine:
Quote:qokeedy qokeedy chedy ol chedy qokain chedy
daiin cthar cthar dan syaiir sheky or ykaiin
chedy ol chedy qokain qokeedy chedy ol
otaiin or okan o oiin oteey oteos roloty
shar are cthar cthar dan syaiir sheky
qokeedy chedy ol chedy qokain qokeedy
Here is what the engine blindly discovered about the grammar, purely by calculating repeating geometric data:
1. Finding Roots and Affixes (16 bit lens) When forced to look at short chunk lengths, the engine outputted these highly recurring structural formulas:
("chedy ")
("edy qo")
("qokeed")
What this means: The engine mathematically isolated "chedy" as a foundational root. More importantly, it mapped exactly how modifiers bind to it. It proved that "qo" acts as a prefix that reliably attaches to form "qokeedy". It defined word boundaries and morphology without actually knowing what a word is.
2. Finding Phrase Syntax (32 bit lens) When we zoomed out to look for longer phrase level chunks, the engine isolated exact, unbroken twelve character sequences that repeat verbatim:
("har cthar da")
("cthar dan sy")
("okeedy chedy")
What this means: In most natural languages, finding an exact twelve character phrase repeating multiple times in a single paragraph is statistically rare. Here, the engine proved it is the core syntax. It mathematically flagged the massive chaining sequence "cthar dan syaiir sheky" as a strictly bound grammatical unit. It also perfectly captured the manuscript's famous reduplication, proving that the repetition of "cthar cthar" is an intentional, permitted grammatical rule, not a transcription error.
How this actually leads to translation We now have a verifiable structural fingerprint. We do not know what "cthar" means, but we know exactly how it behaves.
If a researcher theorises that "cthar" is a noun meaning "water", we no longer have to guess if that fits. We can query the database to see if "cthar" behaves like a noun across all 5,200 plus folios. Does the prefix it takes match the rules for an adjective or article in your proposed target language? If your translation requires "qokeedy" and "chedy" to be entirely unrelated words, but the UFM database proves they are the same root sharing a strict morphological link, then the translation theory is invalid.
This tool reduces the infinite possibilities of translation down to a highly constrained set of grammatical rules that any proposed solution must obey.
Looking for feedback and collaboration I am a data and systems person, not a historical linguist. To prove these patterns are real, the tool generates synthetic control texts, matching character frequencies, and requires the real text to statistically beat the fakes.
I would love feedback from this community:
Are there specific folios or sections you want me to run through the phrase finder?
If anyone wants the raw CSV data exports to cross reference with their own linguistic theories, please let me know.
Thanks for reading. I hope this can be a useful, objective tool to help ground pattern claims in reproducible data.