| Welcome, Guest |
You have to register before you can post on our site.
|
| Online Users |
There are currently 659 online users. » 2 Member(s) | 653 Guest(s) Applebot, Bing, Google, Yandex, NosDa, R. Sale
|
|
|
| Critique wanted |
|
Posted by: KJ now - 19-06-2025, 12:06 AM - Forum: Theories & Solutions
- Replies (21)
|
 |
Title: POLER-D Method: Transparent Translation Framework from a Curious Independent
Hello everyone,
I’d like to take a moment to share the methodology I’ve been developing and refining over time. A phonetic (oral) framework I call POLER-D, short for Phonetic-Oral Linguistic Encoding Reconstruction-Diachronic variant. Recently I began posting translation snippets and findings from this process over on X (Twitter) under @VoynichUnleash for critique, but wanted to offer an overview here on Voynich Ninja. Both for transparency and to welcome any feedback from those more experienced in historical linguistics, cryptography, or manuscript studies.
First and foremost this isn’t a commercial venture, nor a gimmick. I’m not selling anything. I’m driven by deep curiosity and respect for the mystery of the Voynich Manuscript. I believe it's good to explore any method (however unconventional) that helps unlock the possible history within it. I understand that skepticism is healthy and necessary, and I welcome it. But I also want to be clear that this is not a “solved it” post. This method is a work in progress, and one that has already challenged many of my own original assumptions. In fact, most of my early theories have been overturned through application and cross testing. I’m not afraid of being wrong, but it would be a shame not trying.
What Is POLER-D?
POLER-D is built around the idea that the Voynich Manuscript uses a disguised phonetic encoding system. Drawing from spoken (and often unwritten or drifted) dialects from Northern and Eastern Europe and nearby regions as they existed after 1300AD.
Here’s how it works:
We begin with EVA (European Voynich Alphabet) transcriptions of the manuscript text. Then carefully align with standardized formats available in public datasets.
Each word is then broken down phonetically, not visually, using oral sounding reconstructions as if the scribe was encoding sound, not meaning.
For example:
shedy > skaidan / skēd (Germanic root: to speak)
kai > kaido / kaijan (Baltic: keeper/guardian)
chol >chwal / sol (Slavic/Hebrew: soul)
From there, we test drifted linguistic matches against known post-1300 forms in a selected group of languages (listed below), adjusting for diachronic shifts.
We also match results against imagery on the same folios like plants, figures, cosmological diagrams, etc. This is not cherry picking, it's about whether what we decode reflects what we see.
Each translation is checked against a stable pool of phonetic and written languages, with no additions unless justified. Results that don’t fit are discarded or flagged for review.
Languages We Use (Post-1300 Parameter Tightening)
To avoid the common pitfall of “too many possible matches,” we’ve restricted POLER-D to languages and dialects that were actively spoken or transitioning after 1300, including:
Proto-Germanic derivatives: Middle High German, Old Norse, Old Saxon
Celtic: Gaulish remnants, Middle Welsh, Irish phonetics (oral influence)
Slavic: Old Church Slavonic, Old Czech, Old Polish
Romance: Vulgar Latin, Middle French, Occitan, Italian dialects
Greek: Koine and transitional forms
Hebrew: Classical/Biblical Hebrew (as a known scholarly base)
Finnic/Uralic: Karelian, Hungarian
Latin alphabet: All reconstruction uses era-consistent Latin script, no anachronisms
As a base study we tested historical languages throughout the world with POLER-D and none worked beyond the northern and eastern European tribal areas.
Note: Etruscan and other unrelated ancient scripts were excluded deliberately. This isn’t about guessing or using every ancient language. It’s about limiting our pool to reasonable cultural and temporal candidates. We continue shrink parameters when possible.
Current Findings
We’re seeing internal consistency across a number of tested folios. Including those from the herbal, cosmological, and so-called "recipes" sections. Some patterns emerging:
Ritual herbalism, not just plant identification, but instructions, applications, and contexts
Ritualistic bathing, including symbolic language around purification and preparation
Astronomical cycles, likely ritual calendar components tied to lunar or planetary phases
Hermetic influence, recurring themes of balance, fire/water opposites, ascension, and transformation
Interpretations match imagery surprisingly well so far. We continue to apply this method across random folios (e.g., f1r, f67r, f54r, etc.) to test for stability.
I’m documenting the process in real time, including changes as they happen, at @VoynichUnleash on X. You’ll find raw phonetic samples, before/after comparisons, and posts noting where translations are being adjusted. This is about open research, not a closed theory.
Sample Entry
Here’s a real breakdown using POLER-D from f1r:
EVA: otedy shedy laram ychor
POLER-D Phonetic: ot-ed-ē / skēd / lar-an / i-khor
Possible Translation: “To speak the flowing essence (divine fluid)”
Matched Imagery: A root based plant drawn as if exuding fluid, possibly used in ritual speech or invocation.
POLER-D is evolving. We expect some variation in future translations as our parameters improve and mistakes are corrected. But the method is holding strong so far under repeated tests. If you’re skeptical, that’s good. This project is self funded and critique is free.I welcome engagement, suggestions, and yes, critique. I’m not a credentialed linguist or cryptographer, just a persistent researcher who’s stumbled my way into a tool that appears to be uncovering something meaningful.
Now strongly suspected that the Voynich Manuscript was authored by multiple classically trained scribes, not a single individual. This theory is supported by earlier work from paleographers like Lisa Fagin Davis, who identified at least five distinct scribal hands based on letter formation, writing angles, and stylistic patterns across the manuscript. Our findings align with hers, subtle shifts in phrasing, vocabulary range, and glyph formation all suggest collaboration. These scribes likely shared training in Latin, Greek, and Hebrew, but also picked up oral phonetic dialects through travel, allowing them to encode their knowledge in ways that reflected both elite training and folk wisdom.
We believe the Voynich Manuscript represents a collection of ritual practices, likely from preChristian or pagan traditions. Encoded not merely to protect “secrets,” but possibly to shield the content from persecution or prejudice. The use of a disguised phonetic script, rather than a true cipher, supports this idea. About 6 months ago AI was brought in to use to speed up the process and is now showing amazing potential. It took time to enter in all the parameters used, but once AI learned everything it began returning word by word and phrase by phrase to be assessed. I now feel it's a proper methodology to share on multiple platforms and began several days ago.This has been countless hrs spanning yrs of work prior to AI’s introduction. I want that to be cclear.i welcome all critique.
Thanks for reading,
KJ
(@VoynichUnleash on X)
|
|
|
| Month names collection / metastudy |
|
Posted by: Koen G - 13-06-2025, 07:11 PM - Forum: Marginalia
- Replies (107)
|
 |
I split this from the Aberil thread. Many people have found interesting month series over the years that match in various ways to the VM Zodiac inscriptions. Since these are fragmented, shared in different formats and on old websites and blogs, it might be interesting to collect them all here in a unified way.
You are not allowed to view links. Register or Login to view.
Please post/link any old or new examples in this thread. When I have time, I will add them to the spreadsheet.
The color codes are:
Green: complete match or spelling variation of the same word (letters like v-u and i-j were interchangable).
Yellow: has one or more salient features of the VM version.
Red: far off.
|
|
|
| The 'Chinese' Theory: For and Against |
|
Posted by: dashstofsk - 09-06-2025, 08:53 AM - Forum: Theories & Solutions
- Replies (220)
|
 |
A number of people have suggested that the VMS might be in some Chinese language.
I think it is unlikely that this is so, but I am still curious to know what people might think.
The manuscript was written by at least three people, possibly five. And who were they writing for? For themselves, or for other Chinese? Could there really have been that many of them in the whole of Europe, in the time of the manuscript, when trade routes were not that well established, to justify writing in that language? But also why invent a new alphabet when they could have just written in the Chinese script which hardly any European would have been able to read?
There is also a problem with any language that uses tones for meaning, as highlighted nicely in
You are not allowed to view links. Register or Login to view.
|
|
|
| Wherefore art thou, aberil? |
|
Posted by: R. Sale - 08-06-2025, 08:05 PM - Forum: Imagery
- Replies (47)
|
 |
The question is two-fold. Where is the word 'aberil' found and wherefore, for what reason, was it used to name the month of April, twice in the VMs Zodiac sequence?
Where is it found? Nothing relevant on Google.
'Aberil' is apparently one of several variant words that are found in various languages. April is usually found on a calendar and the "ebooks" reference contains a number of liturgical calendars, which can be sorted by language groups. In the German group, the overwhelming preference is for "Aprilis" [Latin] or an abbreviation. In the French language group, the preference is for "Avril".
The only other viable alternative so far is the Germanic-group use of "Abrell" in 1540 Appenzell.
In a ninja search, back in 2019, Anton posted a reference that connects "Aberil" with the Swiss canton of Glarus - with no further info.
Is there more on this?
|
|
|
| What Lies Beneath: Statistical Structure in Voynichese Revealed by Transformers |
|
Posted by: quimqu - 08-06-2025, 01:07 PM - Forum: Analysis of the text
- Replies (15)
|
 |
This work approaches the Voynich manuscript from a fresh angle—by applying small-scale character-level GPT models to its transliterated text. Instead of attempting direct decipherment or semantic interpretation, the focus is on investigating the internal structure and statistical patterns embedded in the glyph sequences.
By training models on different sections of the manuscript using a simplified and consistent transliteration system, this study probes how well a transformer-based language model can learn and predict the character sequences. The results provide compelling evidence that the text is far from random, showing meaningful structural regularities that a machine can capture with relatively low uncertainty.
This computational perspective offers a complementary lens to traditional Voynich research, suggesting that the manuscript’s mysterious text may follow underlying syntactic or generative rules—even if their semantic content remains unknown. It is an invitation to consider the manuscript as a linguistic system in its own right, accessible to modern machine learning tools, and to explore new paths for understanding its secrets.
Objective
The aim of this project is to explore the internal structure of the Voynich manuscript by training a small GPT model on its transliterated text. Using the Courrier transliteration, which offers a simplified and consistent representation of Voynichese glyphs, the goal is to test how well a transformer model can learn and predict character sequences within this mysterious and undeciphered corpus.
Methodology
I trained four different character-level GPT models (≈0.8M parameters) using You are not allowed to view links. Register or Login to view., each on a different subset of the manuscript:
| Notebook | Text Scope | Validation Loss | Perplexity |
| Voynich_char_tokenizer | Full manuscript | 1.2166 | 3.38 |
| Biological_Voynich_char_tokenizer | Only biological section | 1.2845 | 3.61 |
| Herbal_Voynich_char_tokenizer | Only herbal section | 1.5337 | 4.64 |
| Herbal_and_pharmaceutical_Voynich_char_tokenizer | Herbal + pharmaceutical | 1.5337 | 4.64 |
Each dataset was carefully filtered to remove uncertain tokens (?), header lines, and other non-linguistic symbols. Paragraphs were reconstructed using markers from the transcription file.
Why character-level tokenization?
Early attempts at word-level tokenization (based on dot-separated EVA words) yielded poor results, primarily due to:- A large vocabulary size (~15,000+ unique tokens).
- Very sparse and repetitive training data per token.
- Increased perplexity and unstable loss curves.
In contrast, character-level models:- Have a much smaller and denser vocabulary.
- Perform well with limited data.
- Naturally capture the morphological regularities of Voynichese.
Perplexity Results & Interpretation
Perplexity measures how well a model predicts the next token — lower values mean better predictability.
| Dataset | Perplexity |
| Full Voynich | 3.38 |
| Biological | 3.61 |
| Herbal | 4.64 |
| Herbal + Pharmaceutical | 4.64 |
The relatively low perplexity values (3–4.6) show that the model can learn strong internal structure from the Voynich text, particularly in the full corpus and biological section. These numbers are comparable to what we observe in natural languages at the character level, and far from what we would expect from purely random or meaningless sequences.
Why this matters
These results support the long-standing hypothesis — prominently discussed by René Zandbergen and others — that the Voynich manuscript, while undeciphered, exhibits non-random, rule-governed linguistic patterns.
Even though the GPT model has no access to semantics, its ability to predict Voynichese characters with such low uncertainty suggests that the manuscript likely follows an underlying syntax or generation process — artificial or natural.
In essence, the model behaves like a human listener hearing a foreign language repeatedly: it can’t understand the meaning, but learns to anticipate the next syllables based on structure.
Future Work
This approach opens up further directions:- Train section-specific models (e.g., cosmological, recipes).
- Cluster generated tokens morphologically.
- Compare synthetic Voynichese to natural languages.
- Test statistical properties against controlled glossolalia or cipher texts.
To facilitate further exploration and replication, I’m sharing my Github where you can find the Jupyter notebooks used in this study:
You are not allowed to view links. Register or Login to view.
Feel free to download, review, and experiment with the code and data. Your feedback and insights are very welcome!
(08-06-2025, 01:44 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Well, that's the main question for me, because we already know about a lot of regularity from curve-line systems, slot grammars, etc. These approaches have explicit simple rules that are easy to analyze and compare, as opposed to black-box GPT models.
Without some metric that shows that a GPT based approach identifies structures beyond already identified with previous methods, it's hard for me to see if the GPT based approach is of any use at all.
I ran a validation test on a recent output of 891 generated words using a simplified slot grammar system adapted to EVA transliteration conventions.
Basic slot grammar rules:
valid_prefixes = ("qo", "ch", "sh")
valid_suffixes = ("y", "dy", "aiin", "ain", "daiin")
invalid_double_letters = ("tt", "pp", "ff")
invalid_final_glifs = ("q", "e", "ch")
Adapted for EVA transcription in Courier-like glifs:
valid_prefixes = ("4O", "S", "Z")
valid_suffixes = ("9", "89", "AM", "AN", "8AM")
invalid_final_glifs = ("4", "C", "S")
invalid_double_letters = ("PP", "BB", "FF")
Summary of results:- ✅ 96.74% of the generated words match the slot grammar rules
- ✅ 95.5% of the total set correspond to real EVA words from the corpus
- ⚠️ 100% of the words have valid final glifs and no invalid double letters (so these two conditions have been learnt 100%)
Below is a list of the invented words that fulfill 100% the slot grammar restrictions:
Correct invented words: ['SCCX9', 'ZCCC9', '4OFAECC89', '4OFAEZC89', '4OFAEFCCC9', '4OFAEO9', 'SCCC89', '4OFAEZC89', 'SEFAN', '4ORAR9', '4OFCC889']
Below is a table of the invented words that do not fulfill 100% the slot grammar restrictions, and why:
| Word | Prefix | Suffix | EndOK | NoBadDbl | ✅ AllOK |
| ROPAJ | False | False | True | True | False |
| OFAEZE | False | False | True | True | False |
| AEOR | False | False | True | True | False |
| EZCC89R | False | False | True | True | False |
| 4OESCC9R | True | False | True | True | False |
| ESCO8 | False | False | True | True | False |
| 4CFAR | False | False | True | True | False |
| 8AROE | False | False | True | True | False |
| OEFCCC89 | False | True | True | True | False |
| FAEOE9 | False | True | True | True | False |
| POEZC89 | False | True | True | True | False |
| EFS9 | False | True | True | True | False |
| OZCC9 | False | True | True | True | False |
| AEFM | False | False | True | True | False |
| 2OEZCC9 | False | True | True | True | False |
| OEFAROR | False | False | True | True | False |
| 2OEZCC89 | False | True | True | True | False |
| E8AN | False | True | True | True | False |
| Z2AE | True | False | True | True | False |
| AEAR | False | False | True | True | False |
| 8EAM | False | True | True | True | False |
| RSCC89 | False | True | True | True | False |
| 8AEZC9 | False | True | True | True | False |
| 2AROE | False | False | True | True | False |
| EOEZC9 | False | True | True | True | False |
| BOEFAN | False | True | True | True | False |
| EOEFCC89 | False | True | True | True | False |
| 4OFOEOE | True | False | True | True | False |
| 4OFCCOE | True | False | True | True | False |
The high percentage of conformity suggests that the generation process is strongly guided by structural constraints similar to those observed in actual Voynichese. While not all words match real entries from the manuscript, most invented forms remain within plausible morpho-phonological boundaries defined by the slot grammar. This supports the idea that the model is not producing random noise, but instead approximates a coherent internal system—whether artificial or natural.
Update 06/09/2025: I update with a heatmap of the loss per folio according my trained GPT. This gives an insight of the strangest folia according to the model.
(08-06-2025, 05:44 PM)davidma Wrote: You are not allowed to view links. Register or Login to view. (08-06-2025, 03:39 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.The high percentage of conformity suggests that the generation process is strongly guided by structural constraints similar to those observed in actual Voynichese. While not all words match real entries from the manuscript, most invented forms remain within plausible morpho-phonological boundaries defined by the slot grammar. This supports the idea that the model is not producing random noise, but instead approximates a coherent internal system—whether artificial or natural.
Could you in theory test for non-conforming words already in the VM? Or i guess measure how much they break the internal rules? I wonder if it could be interesting to see where these pop up in the VM, if they are predominant in "labelese" or maybe in certain sections? Or certain scribes? To my knowledge this hasnt been done yet but I am probably wrong.
Hi!
Thanks for the thoughtful suggestion — I think you're absolutely right that identifying non-conforming words and tracking their locations in the manuscript could reveal meaningful patterns.
What I’ve done so far is train a character-level GPT model on Voynichese text (using Currier transliteration). Then, I used this model to estimate the average word-level loss for each token in the manuscript — essentially measuring how well the model “understands” each word given its context.
I attach here a png fie of the interesting resulting heatmap of loss, so you can easily see which folios are the most strange ones according to GPT. Surprisingly, the last folios have the lowest loss.
![[Image: jKk1ZAA.png]](https://i.imgur.com/jKk1ZAA.png)
From this, I have been able to:- Compute the average loss per folio, showing how predictable the text is in different sections.
- Visualize this data as a heatmap, coloring folios by their average word loss.
This framework opens up possibilities exactly along the lines you suggested:- Highlighting words or regions where the model struggles most.
- Investigating whether these “high-loss” zones correlate with labelese, specific sections, or particular scribes (if metadata is available).
- Zooming in on the individual words the model finds most anomalous, and seeing their frequency and distribution.
I haven’t done the full detailed analysis yet, but the infrastructure is ready and the heatmap helps guide further exploration.
Below is a BBCode table showing the top 30 most anomalous words in Currier transcription ranked by loss (i.e. the words the model found hardest to predict):
? Top 30 most anomalous words (by loss):
| Word | Loss | Freq | Len |
| RH | 10.1558 | 1 | 2 |
| J | 9.8573 | 3 | 1 |
| E | 9.8573 | 15 | 1 |
| 6 | 9.8573 | 15 | 1 |
| B | 9.8573 | 2 | 1 |
| Q | 9.8573 | 2 | 1 |
| F | 9.8573 | 4 | 1 |
| 9 | 9.8573 | 53 | 1 |
| 3 | 9.8573 | 1 | 1 |
| R | 9.8573 | 35 | 1 |
| 4 | 9.8573 | 4 | 1 |
| 8 | 9.8573 | 32 | 1 |
| Z | 9.8573 | 9 | 1 |
| A | 9.8573 | 1 | 1 |
| 2 | 9.8573 | 121 | 1 |
| C | 9.8573 | 1 | 1 |
| D | 9.8573 | 3 | 1 |
| O | 9.8573 | 18 | 1 |
| 9J | 9.4344 | 1 | 2 |
| FT | 8.8774 | 1 | 2 |
| FU | 8.5493 | 1 | 2 |
| OT | 8.3373 | 1 | 2 |
| 8DE | 8.2584 | 1 | 3 |
| OU | 8.0167 | 1 | 2 |
| O3 | 7.5268 | 3 | 2 |
| 9EJE | 7.5207 | 1 | 4 |
| P3 | 7.4110 | 1 | 2 |
| 8R | 7.3208 | 1 | 2 |
| ON | 7.2464 | 2 | 2 |
| QE | 7.0748 | 1 | 2 |
These words often consist of very short tokens, sometimes single characters or rare combinations, which the model struggles to predict confidently. Investigating where these words cluster — whether in labels, particular manuscript sections, or scribes — could provide insights into the structure or anomalies within the text.
To my knowledge, a comprehensive analysis of “non-conforming” words in the Voynich Manuscript has not yet been performed at this level of detail, so this approach offers a promising direction for further research.
If you or anyone else is interested, I’d be happy to collaborate or share the tools I’ve developed so far.
|
|
|
|