Hi everyone,
I've been running a computational linguistic analysis on the VMS text (58 iterative Python runs over the IVTFF H transcription, 37,025 words) and wanted to share the results for discussion and criticism.
**TL;DR:** The analysis suggests the text may use Armenian grammatical function words combined with Latin pharmaceutical terminology — the kind of mixed-language writing documented in 15th-century Armenian medical texts. 67% of words can be mapped to identifiable Armenian/Latin forms. A blind test shows this is 7.4x above random baseline (6%), so it's not just pattern-matching noise.
## The core findings
### 1. Eight exact matches with Classical Armenian function words
| Decoded | Armenian | Meaning | Frequency |
|---------|----------|---------|-----------|
| vor | vor | who/which | 2x |
| zi | zi | because | 103x |
| or | or | day | 448x |
| gal | gal | to come | 9x |
| tal | tal | to give | ~10x |
| ban | ban | thing/word | ~10x |
| am | am | year / instrumental | 619x |
| ce | ker | food/preparation | ~400x |
The last one (`ce` = Bedrosian Dictionary's `ker` = "food") is particularly interesting because `cocei` (451x) and `coced` (604x) — the two most common prefixed words — decompose as `co` + `ce` + suffix, functioning as the main recipe instruction verb.
### 2. Latin material names (not Armenian)
| Decoded | Latin | Meaning |
|---------|-------|---------|
| ol | oleum | oil (692x) |
| col | cola | strain (574x) |
| cor | cortex | bark (257x) |
| sal | sal | salt (75x) |
| sol | solve | dissolve (196x) |
| can | canna | reed/tube (163x) |
| car | caro | flesh/meat (85x) |
The split is consistent: **grammar = Armenian, ingredients = Latin**. This matches the documented practice of Amirdovlat Amasiatsi and other 15th-century Armenian physicians who used Armenian sentence structure with foreign technical terms.
### 3. The EVA triple consonant system matches Armenian phonology
The EVA ligature system (k/ch/kch, p/ch/pch, t/ch/tch) encodes the Armenian three-way stop distinction (voiceless/aspirated/voiced). This matches the Cilician Middle Armenian consonant shift documented by Vardanyan (1999):
- EVA `kch` → /g/ (Armenian voiced)
- EVA `pch` → /b/ (Armenian voiced)
- EVA `tch` → /j/ (Armenian voiced)
### 4. Two different ligatures for two different suffixes
EVA `aiin` and `ain` — previously treated as identical — show different distributional patterns:
- `ain` → `-an` (genitive/dative, grammatical contexts)
- `aiin` → `-am` (instrumental, appears next to measurement terms)
### 5. Medieval pharmaceutical number system
| Decoded | Meaning | Frequency |
|---------|---------|-----------|
| d | 1 dose (℥j) | 946x |
| i | 1 unit | 621x |
| s | half (semis) | 292x |
| sd | half-dose (℥ss) | 548x |
| si | 1½ | 453x |
| gd | drop (gutta) | 170x |
These appear adjacent to material names in exactly the positions expected for recipe dosages.
### 6. Blind test validation
| Test | Recognition |
|------|-------------|
| Real Voynich | 44.8% |
| Random text (same char frequencies) | 6.0% ± 0.1% |
| Random text (uniform) | 2.0% |
Gap: +38.8 pp (7.4x). The method captures real structure, not noise.
### 7. All sections use the same vocabulary
Every section tested — herbal, biological/"bathing", pharmaceutical, recipes, astronomical, cosmological — uses identical vocabulary (62-80% recognition). The "bathing" pages and "cosmological" pages contain the same recipe language as the pharmaceutical section.
## Sample translation (f75r line 36, 100% recognized)
**EVA:** `sol keedy qokeedy qokey okar otar dar dar dy`
**Decoded:** `sol ced coced cocei ocar odar dar dar d`
**English:** DISSOLVE your-food! PREPARE-your-food! PREPARE! To-here the-medicine. Medicine, medicine, ℥j (one dose).
## What I'm NOT claiming
- This is not a complete decipherment. ~33% of words remain unidentified.
- The exact phonetic values are not all finalized (particularly t→d vs t→t).
- Sentence-level coherent translation is only partial.
- I cannot identify which plants or diseases the recipes describe.
- The word `bor` (wine?) matches neither Latin nor Armenian.
## What I am claiming
- The text contains real linguistic structure (validated by blind test)
- Armenian function words appear at statistically significant rates
- The mixed Armenian grammar + Latin vocabulary pattern matches documented 15th-century Armenian pharmaceutical writing practice
- The text reads as pharmaceutical recipes: materials + preparation + dosage
## Reproducibility
All 58 Python scripts, the full IVTFF data, output files, the Bedrosian dictionary extract, and the Amirdovlat research compilation are available. Happy to share the GitHub repo if there's interest.
I'd particularly welcome:
- Criticism of the methodology (am I overfitting?)
- Input from anyone who reads Classical Armenian
- Comparison with other decipherment attempts
- Statistical critique of the blind test
Thanks for reading. Looking forward to the discussion.
# Appendix: Full decoding rules, vocabulary, and sample translations
## A. Complete Decoding Rules (EVA → phonetic value)
Processing order matters — longer sequences are matched first:
```
LIGATURES (multi-character):
chckh → tsh chcth → tst chck → tsh chct → tst
kch → g pch → b tch → j lch → gh
dch → dj fch → v cth → th
aiin → am iin → in
chee → e che → (silent) cho → kho chy → i
ch → h
qo → co ok → oc ot → ot ol → ol
da → da dy → d ai → a ar → ar
am → am ed → ed ee → e he → (silent)
sh → z in → n pl → pl
SINGLE CHARACTERS:
y → i k → c t → d h → (silent)
All other letters (a, e, i, o, l, d, s, p, r, n, m, c, f, g, u, b, v) → unchanged
```
**Key insight:** The `aiin` vs `ain` distinction is critical. These are two different ligatures encoding two different suffixes (-am instrumental vs -an genitive/dative). Previous transcription analyses treated them as identical.
## B. Complete Identified Vocabulary
### Stems — Armenian origin (8 exact matches)
| Decoded | Armenian | English | Evidence |
|---------|----------|---------|----------|
| vor | vor | who/which | Relative pronoun, exact match |
| zi | zi | because | Conjunction, 103x, exact match |
| or | or | day | 448x, exact match |
| gal | gal | to come | Verb, exact match |
| tal | tal | to give | Verb, exact match |
| ban | ban | thing/word | Noun, exact match |
| am | am | year / with (INSTR) | 619x, exact match |
| ce | ker | food/preparation | Bedrosian Dict. confirms |
### Stems — Armenian near-matches (4)
| Decoded | Armenian | English | Difference |
|---------|----------|---------|------------|
| dar | derman | medicine/remedy | dar ≈ derm (abbreviation?) |
| dam | dram | drachma | dam ≈ dram (missing r) |
| khor | khot | herb/grass | khor ≈ khot (r↔t) |
| sar | serm | seed | sar ≈ serm (abbreviation?) |
### Stems — Latin origin (10)
| Decoded | Latin | English | Freq |
|---------|-------|---------|------|
| ol | oleum | oil | 692x |
| col | cola | strain (verb) | 574x |
| cor | cortex | bark | 257x |
| sal | sal | salt | 75x |
| sol | solve | dissolve (verb) | 196x |
| can | canna | reed/tube | 163x |
| car | caro | flesh/meat | 85x |
| cal | calidus | warm/hot | 49x |
| cear | cera | wax | ~30x |
| ceol | cera+oleum | wax-oil | ~35x |
### Other identified stems
| Decoded | Meaning | Notes |
|---------|---------|-------|
| zol | sap/liquid | 176x, dominant in Herbal section |
| lc | milk | 86x (ragozott), lac? |
| bor | wine | 30x, NOT Latin/Armenian — Hungarian/Turkic? |
| bol | bolite/bolus | Armenian bole (medicinal clay) |
| opi | opium | 9x |
| dol | dose (unit) | 239x |
| ded | they give | 73x |
| com | mix (verb) | 16x |
| ad | add (verb) | 5x |
| tor | grind (verb) | ~20x |
| dal | give (verb) | 322x |
### Suffixes (case system)
| Suffix | EVA ligature | Function | Armenian parallel |
|--------|-------------|----------|-------------------|
| -an | ain | genitive/dative ("to/for") | Grabar GEN/DAT -an |
| -am | aiin | instrumental ("with/by") | Middle Armenian INSTR |
| -ar | ar | allative ("toward") | Word-formative suffix |
| -ed | edy/eedy | uncertain ("also"? "and"?) | Debated |
| -i | y | genitive ("of") | Grabar GEN -i |
| -d | dy | possessive ("your") | Grabar POSS |
| -n | in | definite article | Middle Armenian DEF |
| -al | al | infinitive ("-ly"/"-ing") | Uncertain |
### Prefixes
| Prefix | EVA | Function | Evidence |
|--------|-----|----------|----------|
| co- | qo | imperative ("prepare!") | 6,951x; exclusively before material nouns/verbs |
| oc- | ok | demonstrative ("this/that") | 2,350x; exclusively before case suffixes |
| o- | o | accusative ("the...[object]") | 4,894x; before inflected stems |
**Evidence for distinct prefix functions (Run42 discovery):**
`co-` + ced(604x), can(578x), car(174x), cal(197x) — but oc+ced = 0x, oc+can = 0x
`oc-` + an(365x), ed(235x), ar(148x), ol(86x) — but co+an = 41x, co+ed = 26x
The distribution is almost perfectly complementary.
### Measurements
| Decoded | Medieval equiv. | Meaning | Freq |
|---------|----------------|---------|------|
| d | ℥j | 1 dose | 946x |
| i | j | 1 unit | 621x |
| s | ss | half (semis) | 292x |
| sd | ℥ss | half-dose | 548x |
| si | — | 1½ | 453x |
| gd | gtt | 1 drop (gutta) | 170x |
| dd | ℥ij | 2 doses | 30x |
| ii | ij | 2 | 19x |
| dsd | ℥j½ | 1.5 doses | 66x |
| dsdi | ℥j½+1 | 1.5 doses + 1 | 86x |
## C. Sample Translations (10 pages, best lines)
### You are not allowed to view links.
Register or
Login to view. — Biological section (74.7% recognized)
**Line 22 (100%):**
EVA: `odar shey qokain chedyor shey kar chedy sar`
→ `the-medicine 1½ PREPARE-in-reed! part 1½ meat ℥j seed`
**Line 36 (100%):**
EVA: `sol keedy qokeedy qokey okar otar dar dar dy`
→ `DISSOLVE! food.POSS PREPARE-yours! PREPARE! to-here the-medicine medicine medicine ℥j`
**Line 38 (83%):**
→ `PREPARE-yours! PREPARE-yours! PREPARE-yours! PREPARE-yours! PREPARE-yours! [?]`
*(5x repetition — compare Amirdovlat's "And do this for six days!")*
**Line 26 (100%):**
→ `to.the oil DISSOLVE! to.the oil PREPARE-[?] ADD!`
### You are not allowed to view links.
Register or
Login to view. — Biological "bathing" section (79.5% recognized)
**Line 10 (100%):**
→ `PREPARE they-give them-GIVE ℥ss PREPARE-yours! PREPARE also this-also ℥.with`
**Line 26 (100%):**
→ `oil PREPARE-meat! ℥ss STRAIN them-GIVE the-drop`
**Line 28 (100%):**
→ `GIVE! ℥ss PREPARE-yours! ℥ss to.the ℥.with`
### You are not allowed to view links.
Register or
Login to view. — Biological "bathing" section (76.1% recognized)
**Line 13 (100%):**
→ `PREPARE-yours! oil food.POSS PREPARE-in-reed! this-also food.POSS ℥ ℥j PREPARE they-give ℥j`
**Line 18 (100%):**
→ `oil ℥j PREPARE-in-reed! medicine toward to-this`
### You are not allowed to view links.
Register or
Login to view. — Pharmaceutical section (76.5% recognized)
**Line 22 (100% of decodable):**
→ `2 with-this STRAIN day oil`
### You are not allowed to view links.
Register or
Login to view. — Recipe section (72.0% recognized)
**Line 17 (100%):**
→ `seed 1½ PREPARE! food.POSS PREPARE! |dz| PREPARE-warm! the-℥.GEN day with`
**Line 51 (100%):**
→ `ss.with food.GEN 1½ the-ss.with`
### f85r1 — Cosmological "9-rosette" page (67.0% recognized)
**Line 22 (83%):**
→ `[?] medicine ℥j the-r.with likewise this-also`
## D. Blind Test Details
**Method:** Generate 37,025 random "words" using same character frequency distribution as real Voynich. Apply identical decoding rules and dictionary. Measure % recognized. Repeat 100 times.
**Results:**
- Real Voynich: **44.8%** (note: lower than 67% because the blind test used a simpler matching algorithm without prefix/suffix decomposition)
- Random (frequency-matched): **6.0% ± 0.1%** (range: 5.8-6.4%)
- Random (uniform alphabet): **2.0% ± 0.1%**
- Bigram-preserving random: **49.7% ± 0.3%**
The bigram result (49.7%) deserves discussion. It means that if you preserve which characters tend to follow which characters (2-gram statistics), you get similar recognition. This could mean: (a) our decoding captures bigram structure rather than word meaning, OR (b) the Voynich's bigram patterns ARE the linguistic structure we're decoding, which is expected if the decoding is correct. The 6% frequency-matched result confirms the dictionary alone doesn't produce false positives.
## E. What remains unidentified
The ~33% unidentified words fall into these categories:
1. **Short function words** (o, r, l, co — possibly scribal marks or separators)
2. **Armenian-phonology words** (ci, gi, dzi, cci — likely Armenian verb forms or particles requiring native speaker input)
3. **Compound measurements** (dsdi, dsd, lsd — combined dosage notations)
4. **Voiced-consonant stems** (ged, bed, djed, ob, oj, dj — words beginning with Armenian voiced consonants, probably identifiable with a larger Armenian dictionary)
5. **Long compound words** — likely multi-morpheme constructions we haven't decomposed yet
## F. Reproduction
All code is Python 3. The analysis requires:
- `voynich_data.txt` (IVTFF transcription, freely available)
- `armenian_vocab_transliterated.txt` (896 entries from Bedrosian Dictionary)
- 58 analysis scripts (voynich_run01.py through voynich_run59.py)