| Welcome, Guest |
You have to register before you can post on our site.
|
| Latest Threads |
Water, earth and air
Forum: Voynich Talk
Last Post: Antonio García Jiménez
8 minutes ago
» Replies: 30
» Views: 7,278
|
Elephant in the Room Solu...
Forum: Theories & Solutions
Last Post: Koen G
1 hour ago
» Replies: 135
» Views: 6,607
|
Starred Parags: the last ...
Forum: Analysis of the text
Last Post: ReneZ
9 hours ago
» Replies: 7
» Views: 188
|
Folio reorder in the herb...
Forum: Theories & Solutions
Last Post: ahalay-mahalay
9 hours ago
» Replies: 0
» Views: 56
|
Knight's Path, an upcomin...
Forum: Fiction, Comics, Films & Videos, Games & other Media
Last Post: bi3mw
Yesterday, 09:58 PM
» Replies: 0
» Views: 89
|
L. Rauwolf
Forum: Provenance & history
Last Post: nablator
Yesterday, 05:30 PM
» Replies: 50
» Views: 6,552
|
Distribution of Q-Q gaps ...
Forum: Analysis of the text
Last Post: Jorge_Stolfi
Yesterday, 01:33 PM
» Replies: 5
» Views: 222
|
structural medical encodi...
Forum: The Slop Bucket
Last Post: Koen G
Yesterday, 09:51 AM
» Replies: 1
» Views: 100
|
ORIGINAL stains on the ve...
Forum: Physical material
Last Post: Jorge_Stolfi
Yesterday, 08:06 AM
» Replies: 7
» Views: 218
|
On the word "luez" in the...
Forum: Marginalia
Last Post: JoJo_Jost
Yesterday, 06:44 AM
» Replies: 42
» Views: 1,375
|
|
|
| Bifolio as a functional unit? |
|
Posted by: Bernd - 13-12-2025, 12:32 PM - Forum: Analysis of the text
- Replies (20)
|
 |
Assuming the VM originally existed as a stack of loose bifolia, has it ever been tested if there are textual similarities within a bifolio (4 pages on the same vellum sheet)? I'm aware the sample size is probably too small for proper statistics but it would be interesting to see if there are patterns that link the text on a bifolio compared to single pages.
|
|
|
Hidden animals in the roots |
|
Posted by: Rafal - 13-12-2025, 12:16 PM - Forum: Imagery
- Replies (26)
|
 |
Forgive me if it was discussed before but I haven't seen a global thread about it, just some discussions about individual pictures.
Several times it was suggested that there are hidden animals and other creatures in the plant roots. Not in all roots but in several ones.
I browsed the manuscript and thought about it myself and came to this:
Of course there is such thing as pareidolia. Let's quote a classic philosopher:
“There is an universal tendency among mankind to conceive all beings like themselves, and to transfer to every object, those qualities, with which they are familiarly acquainted, and of which they are intimately conscious. We find human faces in the moon, armies in the clouds; and by a natural propensity, if not corrected by experience and reflection, ascribe malice or good- will to every thing, that hurts or pleases us.”
David Hume
So are these animals there or are these just quirks of our brains?
I will tell you my opinion. For me it's too much to be a random coincidence. These animals are real and intentional.
I am trying to make a poll to see your opinions.
And if there are animals indeed, what are the implications?
One quite obvious one to me is that the artist wasn't copying some herbal faithfully but rather freestyling and improvising.
Another one is that at least some plants are imaginary.
And there is a question - why was he doing it? Just for fun (entirely possible for me) or could there be something deeper behind it?
|
|
|
| Working my way to a semantic word analysis |
|
Posted by: mxv456 - 12-12-2025, 01:51 PM - Forum: Analysis of the text
- Replies (12)
|
 |
Hey folks,
I've been binging Koen's videos on Youtube over the last week, great stuff!
Obviously that means I'm a Voynich novice, but I did use computational linguistics during my PhD at the MPI in Nijmegen, Netherlands, so I couldn't help but dig my fingers into the data :)
I'm not claiming any novelty but I haven't seen the different analysis steps put together in one place so I figured I might as well publish it here. (However, I do think in the end, I have some interesting results that I didn't see anywhere else... but more about this below and in the next post.)
I put the data, scripts and a small analysis report on a dedicated Github repository, if somebody wants to have a deeper look: You are not allowed to view links. Register or Login to view.
My main idea for this round is to perform an TF-IDF analysis. This is a statistical tool where you count how often words occur over the whole text, vs the individual text segments (pages). With this method, one can (approximately) distinguish "content words" from "function words". Content words are those that are specific to a particular topic, like "Voynich" or "Quantum", while function words are words that show up everywhere, like "the", "of", "and", etc. I think it would be tantalizing to produce a list of Voynich words, where we can guess, from section and illustration cues, what they might mean, given where they show up. (Although I don't think it would bring us closer to deciphering the text, it would be fun.)
Down the line, maybe I have time to produce a visual tool where people can explore how words cluster in certain portions of the text. Not quite as fancy as the amazing tool on voynichese.com but in the same spirit.
I'm currently getting into working with LLMs (building them, not talking to Chat GPT) and I am very curious if one can use these tools to identify semantic clusters of Voynich words. Tbd.
I obviously haven't read everything there is on Voynich, but I did my best to go through voynich.nu and Bowern and Lindemann (2020) as well as the latest and pinned posts on this forum, to get a base understanding what's commonly known and what's currently under discussion. I'm looking forward to learn more from you.
I want to start by stating my base preconceptions/assumptions when I went into the analysis, as well as some questions that maybe you can help me with.
Assumptions:
1. The text is real in the sense that somebody in the 15th century wrote something down to communicate information to somebody else.
2. The transcription is reasonably good and conveys the textual content of the VMS to an overwhelming degree, so we can base an analysis on it.
3. The words are words in the sense that they can, through translation, combination, compression, augmentation by auxiliary information, or some other process, be rendered into a language that someone at some time spoke. If there is a cipher, it did not jumble words by moving word boundaries or similar shenanigans.
4. Letters are only meaningful with regards to the manuscript itself. They cannot be identified in a one-to-one manner with any language.
5. The manuscript was written by several scribes/authors, possibly at different times, possibly without knowing each other. The known separations are Currier A and B as well as the 5 Hands (Davis, Lisa Fagin. 2020).
6. There is no hope of me ever decrypting the text by myself since I have none of the necessary skills to actually understand any language that the authors spoke, even less the manuscript itself.
Questions:
1. The whole analysis is based on IT2a-n.txt from You are not allowed to view links. Register or Login to view. Is this the correct choice? As far as I understand, it's a version of the TT transcription, but I don't know what the state of the art is. I noticed that the transcription used on voynichese.com is different and in some places more complete, but I don't know.
2. How is the interaction between "fan groups" like this one and the academic community? From what I saw in the videos, there is a fairly good collaboration, but I still wonder. I know that for some topics that garner so much public interest, there can be a lot of tension.
3. I had trouble finding a "definitive" distinction of pages into Currier A and B. I don't know if that's because it is not fully defined, if there is disagreement or if I just didn't look at the right places.
Base results:
Before we get to the good stuff, I want to post the base analysis as a sanity check but also so they are all in one spot. As I said, these are all well known but it's good for me to see the data myself, so maybe also for others.
I split up all the analysis steps by Currier A and B. Going in, I did not have any idea how close both languages are. My initial assumption was actually that they are as different as German and Latin. These stats helped me understand it better.
[EDIT: I made a mistake in my Currier A/B separation for these plots. Corrected plots in my reply on page 2.]
1. The word length plots for Currier A and B with the distributions for 4 reference languages. (I just chose 4 languages that were easily accessible to me.)
![[Image: word_length_stats.png?raw=true]](https://github.com/Marvel4U/Voynich_semantic_exploration/blob/master/plots/word_length_stats.png?raw=true)
2. The known Zipf distribution of word frequencies with reference languages
![[Image: Zipf_stats.png?raw=true]](https://github.com/Marvel4U/Voynich_semantic_exploration/blob/master/plots/Zipf_stats.png?raw=true)
3. Bigram heatmap.
This one was interesting to me because it shows a very close correspondence between Currier A and B. I expected a much bigger variation.
![[Image: bigram_heatmap_a_b.png]](https://github.com/Marvel4U/Voynich_semantic_exploration/raw/master/plots/bigram_heatmap_a_b.png)
As a reference looked at the bigram statistics of the reference languages and one can see that they vary much much more from each other, compared to the currier A and B.
![[Image: bigram_heatmap_ref_shared.png]](https://github.com/Marvel4U/Voynich_semantic_exploration/raw/master/plots/bigram_heatmap_ref_shared.png)
4. Word start and end bigrams/trigrams
The bigrams and word-initial trigrams did not show that much irregularity but the word-end trigrams clearly shows the famous -edy ending for Currier B
![[Image: word_end_trigrams_a_b.png]](https://github.com/Marvel4U/Voynich_semantic_exploration/raw/master/plots/word_end_trigrams_a_b.png)
What did surprise me is that the -edy ending is also among the most common endings in the Currier A script. From what I read and saw, I assumed that it is almost exclusive to the Currier B. Does that mean that I (a) simply misunderstood or (b) chose the wrong page split between currier A and B?
My current split is defined by the code below. Input is very welcome.
Code: CURRIER_A_RANGES = [("f1r", "f24v"), ("f31r", "f31v"), ("f88r", "f90v1"), ("f100r1", "f116r"), ]
CURRIER_A_SINGLES = ["f25r", "f25v", "f32r", "f32v", "f33r", "f34r", "f34v", "f67r2", "f67v1", "f67v2", "f91v"]
CURRIER_B_RANGES = [ ("f26r", "f30v"), ("f35r", "f39v"), ("f75r", "f84v"), ("f93r", "f96v"), ]
CURRIER_B_SINGLES = ["f68r1", "f68r2", "f68v1", "f68v2"]
I'll leave it at that for now. I'm curious how the interaction in the community here works and if I'll hear from anybody. I'm still preparing the plots for the TF-IDF analysis, as I said, I actually think they are quite interesting. I will add them as reply when I'm done.
Until then, cheerio,
Marvin.
|
|
|
| Testing AI on a Context‑Free Excerpt from My Voynich Translation |
|
Posted by: Malatin_1 - 11-12-2025, 09:51 PM - Forum: Theories & Solutions
- Replies (14)
|
 |
Last February, I made a breakthrough in my research on the Voynich manuscript and was able to identify its original language. It has been a long journey toward complete translations, but now I can translate entire sentences from the work. I base my translations directly on grammar, using meanings drawn from a dictionary.
Today I wanted to test how well an AI can analyze a text without any background information. To do this, I took a short passage from my own translation and gave it to the AI in complete isolation. I did not reveal which text it came from, what language it was originally written in, or what era it belonged to. The AI was only given the words and their basic meanings.
The purpose of the experiment was to see whether an AI could draw conclusions based solely on the text itself – without any clues about its origin.
The result was surprisingly successful. The AI was able to determine which language group the text belonged to, even though it knew nothing about its background. It didn’t need the original manuscript, the context, or even the knowledge that it was a translation. A single text fragment was enough.
The experiment showed that AI can recognize linguistic and cultural features even when the text is completely disconnected from its original environment. At the same time, it confirmed that my translation is not arbitrary, but based on a correct interpretation of the source text.
|
|
|
| About the binding(s?) and missing folios |
|
Posted by: Cuagga - 10-12-2025, 04:48 PM - Forum: Physical material
- Replies (33)
|
 |
Hello Voynich ninja,
Reading Rene Zandbergen's blog, a few things jumped to my mind. In You are not allowed to view links. Register or Login to view., he retraces the most probable evolution of the manuscript in these terms :
Quote:The points that have been presented in relation to the order of production of the MS may now be summarised.
- The MS was produced in a bifolio-per-bifolio manner, with the drawing outlines inked first, followed by the inking of the text;
- The quire numbers were added before the folio numbers;
- The page order has been disturbed, and this happened before both sets of numbers were added;
- The painting was done before the present binding;
- The quire and folio numbers were added before the present binding;
- Some of the painting appears to have been done after the folio numbers were added;
- Twelve of the fourteen missing folios were lost after the folio numbers were added, but before the present binding.
This leads to the following tentative reconstruction: - All bifolios of the MS were prepared: the drawing outlines and the text were added in ink;
- Sometime after this, the planned order of the bifolios was disturbed. The bifolios were stacked anew in an incorrect order (implying that the person who did this was not the original author) but the set was still complete. (The interesting task of identifying the original page order has not been completed, and has mainly been driven by Nick Pelling);
- The quires were numbered first, the MS may have been bound, and the folios were numbered after that. (This initial binding is not necessary but would explain the inconsistency of the quire and folio numbers of quire 9);
- At this point, the book had all folios including the now missing ones, and was not painted, or only partially painted. Folio 42 would not have been painted yet;
- The MS was disassembled and painted (or the partial painting completed). Six bifolios were lost or removed at this point;
- Shortly after the painting, the MS was rebound in the same order, but with the six bifolios missing. Folios 12 and 74 would have still been there. Especially the blue paint transferred on opposite pages;
- Folios 12 and 74 were cut out sometime later
We know that the quire numbers were added before the folio numbers, and that this indicates the presence of a first binding, or at least that the manuscript was prepared for binding (the same page mentions earlier that the marks on q9 only show a preparation for binding but no trace of finishing it at this point).
Is there ANY reason that this first preparation for binding might prepare quires consisting of only one bifolio, and that it woud put those single-bifolio quires at any other point than the edges of the finished product ? If there is not, it indicates that q16 and q18 were composed of 4 bifoliae each, like most of the others, but that 3 of those had disappeared by the time the folio numbers were added (Looking further, I suppose it is likely that those pages were foldouts, like q14 through q19 have in abundance, but this doesn't prevent the existence of more missing unnumbered bifoliae)
Then, for which reason would the first preparator prepare uneven quires (I can see two of them, but none can apply to q8 : either a clear semantic/stylistic link, which explains q20 but q8's remaining bifoliae aren't clearly semantically tied, or the physical unwieldyness of long quires with foldouts, which explains q14 through 19 but can't explain LONGER than usual quires) ? q8 is longer than all other quires (except q20, with its very different text layout than the rest), and as long as q13, which is stylistically coherent, but f57, 58, 65 and 66 are quite different to each other, and they aren't even consistent recto to verso (f57r and You are not allowed to view links. Register or Login to view. can be in the same section, but they are clearly different from the pair f57v-f66r). Currier finds both Language A and Language B in this quire, and the images look to belong in different sections, which indicates one of the following :
- There is a hidden semantic connection justifying to join together bifoliae like that, and the quirer understood the language (very unlikely, as Lisa Fagin Davis' work tends to suggest that quiring itself was a misunderstanding of the book, which should have stayed as a collection of loose leaves, or should have been quired as a thick pile of singulions)
- The missing bifoliae contain drawings and text bridging the gap (possible, but unhelpful)
- Q8 was from the start a patchwork quire, gathering everything that doesn't fit (this indicates that q16 and q18 were bigger than a sigular bifolio without foldouts each, as else they could have been joined into q8 and the resulting quire would still not have been thicker than q20, which by its existence, shows that quires this big are practical ; it doesn't explain, though, why it would have been numbered this low, rather than being put at the end)
- All quires were initially this big and we shouldn't read into 8's length (not really realistic, as it means 7 bifoliae are missing, one in each of the first 7 quires; the most probable outcome would have been to have unequal quires at the start)
The most probable outcome, for me and for now, is the proposition 3 : q16 and q18 were longer than one standard bifolio each, but all the unnumbered ones were lost between quiring and foliating. I still don't have a good idea of why the quirer would create distinct-length quires in the middle of the book rather than counting the extra leaves at the end of the quiring process, but that might be tied to the process itself, in which case I'd love an idea
|
|
|
| Segmented Steganography |
|
Posted by: R. Sale - 09-12-2025, 07:06 PM - Forum: Theories & Solutions
- Replies (1)
|
 |
Rather than the usual type of steganography, based on the placement of words on a page, the VMs has been equipped with a system based on the placement of extended text segments. Rather than 'hiding" words on a page, the VMs "hides" text segments in circular diagrams. Rather than a grille to find the hidden words, the VMs used patterned markers to designate selected text segments. A grille is too obvious and potentially lost. Patterned markers are more subtle, they might be irrelevant, and they stay in place. A functional structure was created, but was it activated?
Examples of marked text segments occur in the cosmos, zodiac, rosettes, etc. Markers occur in a number of variations. While some are elaborate and obvious, do single lines or blank spaces also constitute 'markers'?
The problem is that designated text is still Voynichese writing that cannot be read. However, the advantage is that this provides specific segments of text to compare and contrast with statistical investigations.
As to whether the artist actually recognized this steganographic technique and chose to provide verification in the VMs illustrations, the answer is provided in VMs White Aries where the text marker joins the blue-striped tub.
|
|
|
|