The Voynich Ninja

Full Version: Elephant in the Room Solution Considerations
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
@Jorge_Stolfi, thanks for your considerations.

Let’s begin from your last paragraph, when we do know the language. We can see ‘The’ and ‘the’ as being the same word because we know the language. Otherwise, we do not have that privilege, which is the case in the VM. So, we must treat the transliteration in a way that would distinguish between The and the, unfortunately. There is no other way to do it, in my view.

T, h, and e are separate letters in both versions, because we see that in our supposed original. If they had been connected (not knowing what they mean), we have no choice but respect the original and treat them as connected. If c-c or a c-^-c (a c-c with a diacritic on top) is all one connected symbol, we need to treat it as that and respect the original, rather than assuming it is comprised of the three parts. Of course it is helpful to see if a k between a double c is used separately on its own as well, and v101 does that too (which is why I like it more than others). We will surely note that in that one symbol, the author/scribe is using a c or a gallow to construct it; but still, there is a reason they are visually rendered as one symbol. So, yes, also if we see different looking diacritics for the c-c, even when we are not sure, we should take that into account, unfortunately, since we are dealing with an unknown language.

It is only when we respect that visual, that we can then compare it to a c-c and say, “it seems this one is the same as the other with this or that diacritic on top.” The same holds for “bench” symbols. By splitting it into the three parts, we have already allowed our own assumptions dictate the visual original’s features. We don’t know if the author and the scribe were agreed on that, but that is something we cannot consider as an assumption to prefer this or that transliteration system.

For this reason, following the visuals of the original are essential. Here we arrive at your listed assumptions and conditions.

I do understand your points regarding the assumption variations. But the problem is that they are all assumptions, and we do not know whether any is truer than the others, even though you and I may prefer some over others. To be empirically strict, yes, we should treat noticeably different symbols (even if we suspect it is a different handwriting style) as being different, though see them as possible variations of the same symbol.

Yes, there are challenges the VM poses: 1-The original can be illegible; 2-The retracing may be in error; 3-Author intentions may have been distorted in the scribing; 4-handwriting styles may add undeterminable noise for transliterations choices, and so on.

If it is hard to decide whether a letter is one of two similarly looking ones, that can also affect the word in which it is found. So, the ambiguity would exist no matter what transliteration systems we devise.

But all of the above have less to do with the main point I am trying to convey.

The final goal is to be able to read the VM text, not any transliterations of it, no matter how good or bad the transliteration systems are.

Yes, their helpfulness depends on whether they can most immediately convey the visual characteristics of the original. A double c is a double c, connected on top, not two separate letters. The author and/or scribe wanted to say c-c as one symbol, since it was possibly a contraction for another word or set of words. A bench is all one symbol, not split in three parts as assumed. V101 at least acknowledges the double-c with diacritic as being one symbol, transliterated as 2.

The MAIN problem is when any of these transliteration systems are used to draw linguistic conclusions about the language, i.e., whether it is a natural, constructed, hoax, gibberish, etc., language. You can’t make such judgments based on transliterations, no matter how good or bad they are, whether v101 or EVA.

I have no problem with transliteration systems being used to know statistically how many of this or that symbol appears, alone or in combination in the VM, assuming all the imperfections involved. This of course assumes we can read the text (legibility), scribal handwriting, scribal or other retracing, how faithful or aware the scribes were about a (living or dead) author’s intentions. These complicate constructing such transliteration systems.

But ultimately the criterion should be the study of the actual original as visually displayed in only manuscript copy, we have.

I think preoccupation with transliterations have prevented us from paying more attention to the actual visual information the original text is offering. The more we spend time on the former, the less we spend time on the latter.

The problem I see (and this can address @Renez as well) is that analyses of transliterations, good or bad, for making linguistic judgments about the original. That is where transliteration systems can prevent the study of the text itself. You
(17-01-2026, 01:57 AM)MHTamdgidi_(Behrooz) Wrote: You are not allowed to view links. Register or Login to view.following the visuals of the original are essential.

There are two distinct "flaws" of EVA that you and others seem bothered about:
(1) It uses more than one letter to record a single glyph (a "maximal set of connected pen strokes")
(2) It does not record certain details of the glyphs, like the shape and position of the plume on the Sh.

I addressed (2) in the previous message. 

Yes, whether (2) is good or bad depends on the assumptions one makes about the way the book was written and the intentions of the Author.   As I said in my Voynich Day talk, it would be nice if we could study the VMS "agnostically" -- without making any such assumptions beforehand; but that is impossible.  One must make prior assumptions in order to do any research; in any field, not just on the VMS.  One cannot get out of bed in the morning without making the assumption that the floor is still there where it was the night before, and did not turn into a big pool of ketchup.  

Even if you use VT101, you are making many such assumptions, like "the size of the glyphs does not matter, only their shape", "there are only two kinds of space, inter-glyph and inter-word", "the spacing between the lines does not matter", "it does not matter if a line ends a few mm before or after the right rail", etc.  And "the plume on a bench glyph can have only one of these six shapes".

As for (1): indeed, using two or more Latin letters to encode what we assume  is a single "letter" of the Voynichese script is an annoyance for some types of computer processing, like computing the frequency of "letters".  

But almost every alphabetic script in the world uses two or more letters of its alphabet to denote certain single sounds: "sh" in English, "gn" in Italian, "ch" and "oe" in German, "ll" in Welsh, ...  People who are native speakers of those languages are used to that kind of encoding trick since kindergarten, and thus are hardly bothered by EVA using it too.

EVA does record whether glyphs are connected or not, through certain simple rules. Thus "ch"or "Ch"  is always a single connected glyph (Ch), and so is "CTh" (CTh); whereas "ee" is always two separate glyphs (ee), and "ete" is three (ete).  Besides, Ch is two separate pen strokes, and CTh is four, which makes their EVA encoding seem almost logical.

On the other hand, that feature of EVA makes it possible to write all the common glyphs using only the 26 lowercase Latin letters; which is a big plus for computer processing.  Moreover, it makes it possible to record many rare glyphs, like CTy ("CTy"), Cky ("Cky")  or CTHh ("CTHh") without using additional computer codes or rules.  Additional codes are needed only for rare glyphs, that occur only once or twice in the whole book; and those could all be encoded as "?" without significant harm for most analyses.

And, finally, the EVA design decision to use only Latin letters made it possible to "pronounce" the words. This genial idea of Jacques Guy is very helpful when transcribing. One can mentally "read" a whole word from the scanned image as a string of sounds, and then type that into a file, as one would do when transcribing any other alphabetical language.  That is normally much faster and reliable than trying to memorize the shapes of the glyphs.  That is why the people who study Sumerian or Ancient Egyptian have made up sounds for all the glyph of those scripts, even though in many cases we have no idea of what those glyphs sounded like (or even if they were "letters" at all).

And, of course, everybody who has been in this corner of the internet for more than a week knows that we have absolute no idea of how the Author pronounced daiin, but it surely was not even remotely close to "die in".

In conclusion, which encoding is best for you depends on the assumptions you make and what sort of analysis you want to make.  Fortunately Rene's tools let you choose between any of the historical or current encodings; and they could easily accommodate more. "The great thing about standards is that there are so many to choose from"...

All the best, --stolfi
(17-01-2026, 07:46 AM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.This genial idea of Jacques Guy

actually that was Gabriel Landini, who, I may add, sadly passed away in October last year (as I mentioned only to a few people so far).
I remember Gabriel Landini joining the forum during a discussion I reopened about Zipf's law. It was last summer, just shortly before his death. I'm sorry for your loss, René. He must have been a good friend of yours.
@ ReneZ, Antonio García Jiménez, Jorge_Stolfi, and colleagues, please accept my condolences for losing a good friend in Gabriel Landini. I was also sad to learn (at the beginning of my research in the VM last year) that Stephen Bax also passed away in 2017.

@ Jorge_Stolfi, thanks again for your rejoinder. I understand your point overall is there are many options to choose from among transliteration systems depending on what we wish to do in research and assumptions we have (ones that in your opinion are inevitable) or preferences we may have for having a “pronunciation ability” value in one or another system.

Of course all can do what they wish, but we may also have differing views as to which system can be more or less helpful. But two points I have been making are somewhat getting lost in the cycle, and these are the more important points I was trying to make.

First, my original concern was about not just transliterating, but errors in doing so. In “ydaraishy” on f1r, ‘sh’ is assuming that the diacritic and the first c are part of the same letter.

[attachment=13515]

In my examination of the text, that is simply an “error”. That is not a ‘s’, it is a c with a diacritic that belongs most likely to the double-c and there is clearly a space between the diacritic and the first c and I think the parchment defects have made the diacritic longer than what it is in this case (but there a plenty other examples throughout the manuscript that are deniable), and even then it is not touching the c. Whoever is transliterating is making a huge mistake in giving an impression in the transliteration about the ‘s’ and it is misguiding whoever is using that transliteration.

Just a couple of lines below ydaraishy, you have another word, and even when the diacritics are undeniably separate they the c with diacritic is given the 's' assignment as "shoshy"! Why"!? Just because it sounds better?

[attachment=13517]

Of course this is not intentional, and done with good intentions I am sure, but this is not just one case. According to the VM browser, when you search for ‘sh’ 4464 cases are counted when the combination is used as part of a longer string (or word); see the image at the bottom of this post. The error is everywhere. That means an error has been repeated systemically in constructing that transliteration system. That is what makes it a bad system in my view, sorry to say. I would not recommend it, though all of course have a choice to make and in my view that will inevitably lead to slop.

I don’t understand really how anyone, let alone experts, can read a ‘c’ with a diacritic that is clearly a part of a double-c with tops connected as one letter and give it a ‘s’ transliteration. This is not just about an assumption preference. This is about a scientific observation with naked eye. As a result, many who just use that transliteration system without checking the visual origins, will end up studying a Voynich text that does NOT exist as such, because they are studying a particular, in my view verifiably mistaken, transliteration of it.

So, I respectfully disagree with you regarding the benign nature of the choices to be made. I would rather stay with a more cumbersome but more visually accurate transliteration than do something simply because it sounds and pronounces better.

I think you are avoiding this key point I was raising, and my not reminding you of it in my last post may have caused it to be lost in the conversation.

But, then you are avoiding another point I have raised many times, and that is, no matter what transliteration system you use, pronunciation friendly or not, it would be a fundamental mistake to use that system as a basis for making linguistic judgments and drawing such conclusions. From what I have read some have used such statistical studies to claim the language is or not natural, is or not gibberish, is or not this or that language, etc. I think that is a big mistake.

Transliteration systems can be helpful, if they are not obviously mistaken and misleading, to give us some statistics of what exists or not in the text, if visually faithful. However, using them to pass linguistic judgments is a huge mistake and can explain why after decades not much progress has been made in certain studies. 

I think what Koen G.’s study of the last page marginalia did, or what JustAnotherTheory found yesterday about the other marginalia go a long way more to help in understanding the Voynich manuscript text. These are good efforts and always bear good results. What JustAnotherTheory found again proves that someone writing in Latin is able to read the text, tentatively speaking. 

These seemingly little insights can be path-breaking, but preoccupation with transliteration preferences may take away our attention from them. No one is denying their value (obvious mistakes notwithstanding) in giving us a sense of what exists or not in the Voynich manuscript, but they cannot be used to draw linguistic judgments in a statistical way about the VM text, and their verifiable mistakes should be taken into consideration and red-flagged to readers, not matter how much we appreciate the efforts that have gone into creating them.

Visual studies of the text should be preferred for any convenience transliterations can offer, pronunciation friendly or not. 

[attachment=13516]
(17-01-2026, 07:03 PM)MHTamdgidi_(Behrooz) Wrote: You are not allowed to view links. Register or Login to view.In my examination of the text, that is simply an “error”.

But that is where personal assumptions come in.  In my view, there is indeed only one Sh glyph, and all the variations of plume size, shape, position, and connectivity are just normal variations in the way it is rendered by the Scribe.  Thus transcribing each variant with a different code is worse than wasted work: it means contaminating the transcription with meaningless noise, and forcing the users of the transcription to waste time removing that noise (by mapping all the variants to the same code).

All the best, --stolfi
@ Jorge_Stolfi, thanks again. I’m not convinced, and still some issues I raised are not addressed, but that’s ok. We are entitled to our opinions, and nothing wrong with having disagreements. I wish you well in pursuing your way achieving good results. There is no need for going in circles.
The next point I wish to make is about the botanical/herbal part of the Voynich manuscript.

VM scholars have over the decades confronted a dilemma. Most, if not all, of the plants are not straightforward depictions of actual plants. Some scholars have leaned on arguing and showing that some demonstrate features of actual plants, others not, while the VM plant images themselves display features that are obviously fantastic and creatively spirited. Partial resemblances, which could be at times incidental, have led some scholars to propose the plants depicted may have been from world regions not “discovered” at the time, fueling their own and others’ senses of the enigma about the manuscript.

Given the VM text has been unreadable, therefore, scholars have wondered whether any practical information is being conveyed in the codex for medicinal or pharmaceutical purposes, since the plant images are not realistic. This, then, has led some to propose the VM may just be a hoax.

In this post I will try to show that indeed the manuscript can be conveying in its text practical medicinal information (based on the knowledge of those times, of course) while the plant images are creatively done and not necessarily factually realistic. For this purpose, I invite you to do a thought experiment here.

Let us say person A knew herbal medicine well, knew the plants well from the field, could actually fetch and prepare them medicinally, and wrote a manuscript depicting them graphically in a realistic way, offering for those times practical medical information and instruction.

Years later, person B comes along and obtains that manuscript. Impressed, he or she does not have access to the fields for actual identification of the plants and preparing them. So, she just buys them from people who do, and follows the instructions in the A manuscript, and decides to prepare a copy of the same for her friends in her own language. Now, she is drawing from the A drawings, not as exact as what the plants look like in the field, adding her own creative touches, depicting the roots in a way that gives a hint for the plants’ medicinal use, and so on. Why not, it is much easier to remember a plant visually (even if not realistic) than by strange names.

Years later, person C comes along and does the same as B, now not even knowing A wrote the original. Now C is again inspired to prepare for his own community in another area speaking a different language, his own manuscript. So, a C manuscript is prepared that in graphics even less resembles A or B, but the text is for all practical purposes the same or even improved from experience, because of a wider research or even one’s own practical results, with more valuable information added.

So, this goes on and on until we reach to the person V for the author of the Voynich manuscript. The plant pictures are now mostly fantastic, some having remains of this feature or that feature. The plants are even available now in the market, dried, in their essences, or imported to the area by merchants dried up or in container essences. 

But because she has seen them in the person U (preceding her) manuscript as such, she may even assume some features of the plants are realistic, aside from the elements she herself adds for the purpose of healing herself and loved ones. The illustrations also function as a way of making tangible the differences among plants. She says to her sister, for example, “you should use the plant with the eagle-looking root, for your eyes.”

The key point, though, is that the text may have even improved, based on a diversity of practical information read in books accumulated from A to U (of course practical for those times), while the plant illustrations went ever more unrealistic, even though they serve to graphically show the value of the plant. 

The roots are now magnified to show their medicinal value. This root is like an eagle, it’s great for the eyes. This one looks like a scorpion, best for healing bites, or avoid, it is poisonous. If you are bitten by a snake, use this one with the snake-like root. This one improves your muscles like a beast, as you can see in the root image. This one is good for the wound, hence a wound like image in the root. This one is great for hair growth, hence face marginalia on its roots, or is perfect for headaches. And so on.

So, this in my view explains the paradox we find in the Voynich manuscript, and it has been exacerbated because we can’t yet read its text. Were we able to read the text, we would have not been surprised at all, since in fact in the medieval times, you can find depictions of herbal plants that are even more fantastic than those found in the Voynich manuscript.

For the above reason, I think it would be a major mistake to assume that the illustrations being unrealistic, artistic, or ambiguously un/traceable, necessarily suggest the texts about them are also the same and thereby the manuscript was not of actual (for that time) medicinal value in practical terms for its author.

We should always keep in mind the fact that the plants don’t comprise only a major section of the manuscript. They are ubiquitous to the entire manuscript, and therefore must be treated as such. Without them, you would not have the pharmaceutical section, the astrological planners for their use as intakes or topical use, the balneological section for the same, and the largest foldout summing them all up (in my view) amid a spiritual, cosmological, environmental, or legacy depicting context. 

And the missing pages could have included more personal identifying information about who was authoring the manuscript and how it was being customized for that person’s astrological chart(s). In traditional medicine, the same medicine, or even drinking wine, may not be suitable for all, given people were regarded as having hot or cold natures, inclined to be this way or that way, and astrological sciences of the times had minute information about a person’s nature based on when and where he or she was born. 

Therefore, I don’t think the plant illustrations being unrealistic or ambiguous should in lead us to deny the practicality of the information being offered for them in a handbook. The two are two separate things. 

In fact, if the above thought experiment is considered, to solely rely on illustrations to find a way to their description (something that we have had no other choice to do, because of not being able to read the text yet) may be misleading and a waste of time generally speaking (beyond reliable information that may still be gleaned from them, of course, and many scholars have contributed a lot to the information and they are to be appreciated, even when they find that a plant is not what others claim it to be).

I think the text could be providing specific information about specific and known plants (to the extent known then), with instructions about their medicinal value and how they can be extracted, prepared and used for what ailment, without the plant illustrations being necessarily accurate. 

From what I have read, by the 1300s-1400s so many herbal books were being copied and recopied from one another, that each time the scribes added their own touches to the pictures, to the point where the depicted plants became unrealistic and unlike the actual original plants. Their texts must have remained reliable, but not necessarily their illustrations. 

So, very practical text about the plants ended up being accompanied by unrealistic pictures. The illustrations may have served another, marketing, function, when being sold to rich merchants who had little idea about the accuracy of plant depictions, while being given useful (for their time) practical medicinal information. 

For this reason, I am inclined to make a distinction between the practical herbal and medicinal information about plants on one hand and their fantastic illustrations on the other. 

I am also inclined to allow for the possibility, based on my previous posts, that the plants’ depictions at the hands of the 1400s scribes may have altered the original depictions the author had made and kept in her complete parchment 1300s original. It is even possible that while the author was creating this manuscript, she invited an ill sister or child to help with its illustrations, which later provided a source for the 1400s scribes to do their work.

An excellent dissertation by Shirley Kinney titled “The Origins of the Herbarium of Pseudo-Apuleius” may be helpful in finding what kind of descriptions could have accompanied even the unrealistic plant illustrations in the book she studied, whose author and origin seems to be also obscure (hence, “Pseudo” in the name (You are not allowed to view links. Register or Login to view.). 

I think MacroP at some point drew on her dissertation in this forum, and I invite you to consider what Kinney is sharing, giving us a sense of what the text of the VM herbal section could have been. The Herbarium’s actual author was, according to her, very skeptical of the “professional doctors” and his book intended to provide people a self-help handbook to practice their own medicine. That is what the author of the Voynich manuscript was likely doing. Her own (and her loved ones’) life was at stake, so she wanted to make sure to have not only best expert advice, but in a way that she could learn and practice it herself.

For example this is one description Kinney cites and translates (you can do a search in her pdf yourself for more):

“… For a woman’s excessive flux. You give the drink as above while saying “Little herb Proserpinaca, the daughter of the ruler of the Underworld, just as you stopped the birth of the mule, may you also restrict the wave of this blood”.” 

Or, “… For dyeing the hair. The herb callitricum, crushed in oil and applied to the head, dyes the hair.”

Or, “ … For colon pain. The leaf of the herb politricum, which has twigs like the bristles of a pig, crushed with 9 grains of pepper and 9 grains of coriander seed, crushed together in the best wine, give it to drink to someone about to bathe. One also makes this for nourishing the hair of women.”

Or, “… For weak eyesight. It is said that when an eagle wants to fly high so that it can look out over all of nature, it plucks a leaf of the herb lactuca silvatica and moistens it eyes with the juice of the herb and achieves the clearest sight. Therefore, the juice of the herb lactuca silvatica is mixed with old wine and acapnus honey (which means that the honey is obtained without driving bees away by smoke). Mix together the best juice of the herb, the wine, and the honey and crush it and store it in a glass ampulla, and when you use some of it, you will experience the greatest medicine.” (notice, the name of the plant is repeated twice in the above, hence, a reduplication).

I think the marginalia on the last page of VM may have been a reflection on such magical incantations being given when taking medicine, advising folks not to use God’s name in vein or something like that. 

The following images (at the bottom of this post) from other herbal book illustrator show how people could have played with depictions of root as vases, or human heads in roots. The large root trunks may be a simplistic way at the time of “magnifying” the root part to show its details, when the artist could not do so if the root was in smaller size. 

In fact, rather than being unrealistic, the author is realistically exaggerating the features to say, “hey, don’t take this connection like a tree trunk for the root seriously, I am just magnifying the root to tell you what the plant is good for or looks like.” The roots are important for her pharmaceutical applications, as the latter images show, given their focus on the roots (for most, though not all, container essences).

This can explain why some VM plants seem to be placed on wider root trunks. The tact signified a magnification effort to convey not just realistic but also creative medicinal value of a plant.

This thread is about trying to connect dots and see the wider picture of details, and that cannot be done in brief lines, unfortunately.

[attachment=13542]

[attachment=13541]
In the Voynich manuscript’s plants section, there are two that stand out from others, because to them a listed text is associated. 

The first is F49v, where the listed items is on its own page, and the second F65v, whose text continues (in my view) to the following page F66r, with a long list as well, on the bottom left corner of which the apparently ill person is drawn with a text next to it. I strongly believe that 66r is a continuation of text for You are not allowed to view links. Register or Login to view. (and if it is, it may challenge the view that the bifolia were separately usable, but a pagination were intended, from the start, but this is a different question).

For now, I was wondering what is the latest consensus (or close to one) on the identity of those two plants (images on the bottom of this post):

For F49v, Edith Sherwood identifies it as Water Lily (You are not allowed to view links. Register or Login to view.) and
(ReneZ You are not allowed to view links. Register or Login to view.) is listing it (following their researcher name abbreviations) as
“ELV: Lunaria. ThP: lunaria, cyclamen (Holm), alpenveilchen." Koen G., based on others’s views and likely consensus, has identified another plant You are not allowed to view links. Register or Login to view. as Water Lily (You are not allowed to view links. Register or Login to view.), but You are not allowed to view links. Register or Login to view. can also be a blue Walter Lily, it seems (large leaves, smaller flowers, which were blue in European context, and long thin stem, suggesting aquatic nature).

For f65v, Koen G. lists (based on others' views generally) it as being Chamomile, which seems acceptable to me. ReneZ also gives the following (ELV: anthemis, some kind of chamomila. ThP: chamomile. st.jacobskraut) (You are not allowed to view links. Register or Login to view. ). But JKP disputed it on that thread (You are not allowed to view links. Register or Login to view.), and I am not sure about that myself. Edith Sherwood identified it as Cornflower (Centaurea cyanus) (You are not allowed to view links. Register or Login to view.).

I realize there is no absolute consensus on any of the above, and there are many experts not named above that have expressed their views about the plants' identities. But if anyone (especially the plant experts) can update and share their latest identifications of the two plants, I would greatly appreciate it. Thanks.

This is You are not allowed to view links. Register or Login to view. [attachment=13570]

This is You are not allowed to view links. Register or Login to view. [attachment=13571]
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18