17-01-2026, 01:57 AM
@Jorge_Stolfi, thanks for your considerations.
Let’s begin from your last paragraph, when we do know the language. We can see ‘The’ and ‘the’ as being the same word because we know the language. Otherwise, we do not have that privilege, which is the case in the VM. So, we must treat the transliteration in a way that would distinguish between The and the, unfortunately. There is no other way to do it, in my view.
T, h, and e are separate letters in both versions, because we see that in our supposed original. If they had been connected (not knowing what they mean), we have no choice but respect the original and treat them as connected. If c-c or a c-^-c (a c-c with a diacritic on top) is all one connected symbol, we need to treat it as that and respect the original, rather than assuming it is comprised of the three parts. Of course it is helpful to see if a k between a double c is used separately on its own as well, and v101 does that too (which is why I like it more than others). We will surely note that in that one symbol, the author/scribe is using a c or a gallow to construct it; but still, there is a reason they are visually rendered as one symbol. So, yes, also if we see different looking diacritics for the c-c, even when we are not sure, we should take that into account, unfortunately, since we are dealing with an unknown language.
It is only when we respect that visual, that we can then compare it to a c-c and say, “it seems this one is the same as the other with this or that diacritic on top.” The same holds for “bench” symbols. By splitting it into the three parts, we have already allowed our own assumptions dictate the visual original’s features. We don’t know if the author and the scribe were agreed on that, but that is something we cannot consider as an assumption to prefer this or that transliteration system.
For this reason, following the visuals of the original are essential. Here we arrive at your listed assumptions and conditions.
I do understand your points regarding the assumption variations. But the problem is that they are all assumptions, and we do not know whether any is truer than the others, even though you and I may prefer some over others. To be empirically strict, yes, we should treat noticeably different symbols (even if we suspect it is a different handwriting style) as being different, though see them as possible variations of the same symbol.
Yes, there are challenges the VM poses: 1-The original can be illegible; 2-The retracing may be in error; 3-Author intentions may have been distorted in the scribing; 4-handwriting styles may add undeterminable noise for transliterations choices, and so on.
If it is hard to decide whether a letter is one of two similarly looking ones, that can also affect the word in which it is found. So, the ambiguity would exist no matter what transliteration systems we devise.
But all of the above have less to do with the main point I am trying to convey.
The final goal is to be able to read the VM text, not any transliterations of it, no matter how good or bad the transliteration systems are.
Yes, their helpfulness depends on whether they can most immediately convey the visual characteristics of the original. A double c is a double c, connected on top, not two separate letters. The author and/or scribe wanted to say c-c as one symbol, since it was possibly a contraction for another word or set of words. A bench is all one symbol, not split in three parts as assumed. V101 at least acknowledges the double-c with diacritic as being one symbol, transliterated as 2.
The MAIN problem is when any of these transliteration systems are used to draw linguistic conclusions about the language, i.e., whether it is a natural, constructed, hoax, gibberish, etc., language. You can’t make such judgments based on transliterations, no matter how good or bad they are, whether v101 or EVA.
I have no problem with transliteration systems being used to know statistically how many of this or that symbol appears, alone or in combination in the VM, assuming all the imperfections involved. This of course assumes we can read the text (legibility), scribal handwriting, scribal or other retracing, how faithful or aware the scribes were about a (living or dead) author’s intentions. These complicate constructing such transliteration systems.
But ultimately the criterion should be the study of the actual original as visually displayed in only manuscript copy, we have.
I think preoccupation with transliterations have prevented us from paying more attention to the actual visual information the original text is offering. The more we spend time on the former, the less we spend time on the latter.
The problem I see (and this can address @Renez as well) is that analyses of transliterations, good or bad, for making linguistic judgments about the original. That is where transliteration systems can prevent the study of the text itself. You
Let’s begin from your last paragraph, when we do know the language. We can see ‘The’ and ‘the’ as being the same word because we know the language. Otherwise, we do not have that privilege, which is the case in the VM. So, we must treat the transliteration in a way that would distinguish between The and the, unfortunately. There is no other way to do it, in my view.
T, h, and e are separate letters in both versions, because we see that in our supposed original. If they had been connected (not knowing what they mean), we have no choice but respect the original and treat them as connected. If c-c or a c-^-c (a c-c with a diacritic on top) is all one connected symbol, we need to treat it as that and respect the original, rather than assuming it is comprised of the three parts. Of course it is helpful to see if a k between a double c is used separately on its own as well, and v101 does that too (which is why I like it more than others). We will surely note that in that one symbol, the author/scribe is using a c or a gallow to construct it; but still, there is a reason they are visually rendered as one symbol. So, yes, also if we see different looking diacritics for the c-c, even when we are not sure, we should take that into account, unfortunately, since we are dealing with an unknown language.
It is only when we respect that visual, that we can then compare it to a c-c and say, “it seems this one is the same as the other with this or that diacritic on top.” The same holds for “bench” symbols. By splitting it into the three parts, we have already allowed our own assumptions dictate the visual original’s features. We don’t know if the author and the scribe were agreed on that, but that is something we cannot consider as an assumption to prefer this or that transliteration system.
For this reason, following the visuals of the original are essential. Here we arrive at your listed assumptions and conditions.
I do understand your points regarding the assumption variations. But the problem is that they are all assumptions, and we do not know whether any is truer than the others, even though you and I may prefer some over others. To be empirically strict, yes, we should treat noticeably different symbols (even if we suspect it is a different handwriting style) as being different, though see them as possible variations of the same symbol.
Yes, there are challenges the VM poses: 1-The original can be illegible; 2-The retracing may be in error; 3-Author intentions may have been distorted in the scribing; 4-handwriting styles may add undeterminable noise for transliterations choices, and so on.
If it is hard to decide whether a letter is one of two similarly looking ones, that can also affect the word in which it is found. So, the ambiguity would exist no matter what transliteration systems we devise.
But all of the above have less to do with the main point I am trying to convey.
The final goal is to be able to read the VM text, not any transliterations of it, no matter how good or bad the transliteration systems are.
Yes, their helpfulness depends on whether they can most immediately convey the visual characteristics of the original. A double c is a double c, connected on top, not two separate letters. The author and/or scribe wanted to say c-c as one symbol, since it was possibly a contraction for another word or set of words. A bench is all one symbol, not split in three parts as assumed. V101 at least acknowledges the double-c with diacritic as being one symbol, transliterated as 2.
The MAIN problem is when any of these transliteration systems are used to draw linguistic conclusions about the language, i.e., whether it is a natural, constructed, hoax, gibberish, etc., language. You can’t make such judgments based on transliterations, no matter how good or bad they are, whether v101 or EVA.
I have no problem with transliteration systems being used to know statistically how many of this or that symbol appears, alone or in combination in the VM, assuming all the imperfections involved. This of course assumes we can read the text (legibility), scribal handwriting, scribal or other retracing, how faithful or aware the scribes were about a (living or dead) author’s intentions. These complicate constructing such transliteration systems.
But ultimately the criterion should be the study of the actual original as visually displayed in only manuscript copy, we have.
I think preoccupation with transliterations have prevented us from paying more attention to the actual visual information the original text is offering. The more we spend time on the former, the less we spend time on the latter.
The problem I see (and this can address @Renez as well) is that analyses of transliterations, good or bad, for making linguistic judgments about the original. That is where transliteration systems can prevent the study of the text itself. You