Some contrarian views on transcription

Some contrarian views on transcription - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Some contrarian views on transcription (/thread-4252.html)

Some contrarian views on transcription - kckluge - 28-04-2024

* Contrarian view #1: Don't bother sweating the "weirdos".

Having built scripts to convert both the L-Z EVA-based transcription and the v101 transcription to the Currier alphabet, my recollection is that the fraction of glyphs for which there is not an unambiguous Currier equivalent (at least with regard to the running text in the initial herbal quires and the bio section) is roughly half a percent. I'll double check, but I'm pretty sure that's right (which is not to say that the transcriptions agree with each other at that level). That means "basic EVA"/Currier/just the vanilla ASCII bits of v101 captures roughly 199 out of every 200 glyphs. That should be good enough to read the text (if there is a text to read) -- and if it's not, then I would argue that there's no point in worrying about it.

To be clear, this is a pragmatic claim not a theoretical one. If the question is "is it possible that reading the text requires capturing every nuance of every 'weirdo' in the text?", then I have to agree that yes, abstractly it is possible. The text could be generated in some way that has some kind of state such that unless we capture all the weirdos we'll fail in trying to read it. I don't think I've ever seen anyone make a compelling case that the bulk statistics of the text make that likely, but it's possible.

Pragmatically, if that's the case then I think that without some additional side channel of information -- finding a "bilingual" document enabling a known plaintext attack, for instance -- we might as well throw in the towel. Which makes investing large amounts of effort in encoding "weirdos" (as opposed to just marking them with something like the Currier alphabet's '*' "here be a dragon" character) an unproductive use of time. Which means "basic EVA"/Currier/just the vanilla ASCII bits of v101 should be good enough.

That's not the same thing as saying that there isn't room for argument over whether "basic EVA" (for instance) is capturing the right equivalence classes of groups of ink strokes. I've seen people claim that whether an 'a' is closed at the top or not matters, for example -- but that's a different issue.

* Contrarian view #2: For the sake of all that's good and bright and beautiful in the universe, can we please, please, please stop using EVA?

While I have never loathed EVA with the blazing white-hot passionate hatred that Glen Claston did (and anyone who thinks I'm exaggerating can go read his Voynich mailing list remarks on the subject), I just don't see the argument for "why EVA?". Granting the premise that there is value in an "analytic" transcription that is neutral about how to read the ligatured gallows or word-final i*<x> sequences, I fail to see why EVA is that transcription -- and in particular, I see no reason to prefer it to Frogguy (You are not allowed to view links. Register or Login to view.):

1) I have never understood the virtue of prioritizing making the transcription pronounceable over visual resemblance to the script. I mean, sure, a 'd' kind of looks like an '8' with the upper loop squished, and a 'y' kind of looks like a '9' without a closed top loop, and a 'q' kind of looks like a '4' written by someone who hates corners, but...why? According to You are not allowed to view links. Register or Login to view. it's to help make common words easy to recognize and remember. I suppose this is one of those "your mileage may vary" things.

2) In fact, the pronounceability of EVA has had the unfortunate effect of a non-trivial number of naive newcomers to MS 408 thinking there is actual significance to the phonetic values in the EVA transcription scheme. I realize that the people behind EVA didn't intend that to be the case, and are explicit in various places in making clear it isn't, but if someone just grabs a transcription file without "reading the manual" that doesn't help.

3) The clear advantage of Frogguy is that the learning curve is truly minimal. The gallows, for example, are 'lp', 'qp', 'lj', and 'qj' -- and anyone who has seen the actual text should immediately grok which is which...

4) As Rene says on the page referenced above, "It is very important to point out that Eva is not attempting to identify semantic units in the text. It simply represents in an electronic form the shapes that are seen in the MS. It is left to a later step by analysts to decide which combinations should be seen as units." If you're going to have to transform the transcription to do meaningful analysis anyways, why not do it from something that maximizes the fluency of transcription with a lower learning curve (and probably lower transcription error rate)?

I think that's probably enough of me being a curmudgeon for the evening...

Karl

(PS, coming soon -- the Midsomer Murders MS 408-themed fanfic you never realized you needed. When a visitor researching a possible connection between Midsomer and the mysterious Voynich manuscript is found murdered at a Voynich-inspired spa & herbal treatment center, Winter and Barnaby have to decode the killer's motive before there are more deaths. How many more victims will die before they succeed in...Deciphering Murder?)

RE: Some contrarian views on transcription - Koen G - 28-04-2024

Agreed with the first part. The text is so large and consistent that weirdoes can be safely ignored for the most part. Unless, of course, an analysis is focused on the weirdoes specifically. They can be valid objects of study, but matter little when it comes to the text as a whole.

Sometimes I think it might help to take this idea even further. For example, 98% of occurrences of EVA-a are provided by [ai, ar, al, am, an, as]. If you leave off [an] and [as], you still get 96%. Something like [ao] accounts for 0.07%. My point is that fretting over the existence of [ao] probably distracts us from what is really going on - a system that predominantly produces [ai, ar, al, am].

Regarding the use of EVA, I think this is a matter of choices made. This will be the issue with all transcription systems: the user has to be aware of the decisions underlying the system.

RE: Some contrarian views on transcription - kckluge - 28-04-2024

(28-04-2024, 11:23 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.[...]

Sometimes I think it might help to take this idea even further. For example, 98% of occurrences of EVA-a are provided by [ai, ar, al, am, an, as]. If you leave off [an] and [as], you still get 96%. Something like [ao] accounts for 0.07%. My point is that fretting over the existence of [ao] probably distracts us from what is really going on - a system that predominantly produces [ai, ar, al, am].

[...]

This is what I refer to as the question of what does "always" mean when always never happens (AKA how to avoid special pleading)? So, is '4' (or 'q' if you prefer, he said grinding his teeth) "always" supposed to be preceded by a space? I think it is, but empirically it isn't. In Bio B in the D'Imperio transcription 4 is word-initial 1456 times, line-initial 198 times, and word-medial 22 times (or 1.3% of the time). Are 'M' and 'N' "always" supposed to be word-final? Well, in the same sample they are word- or line-final 879 times and word-medial 26 times (or 2.9% of the time). I think the way to avoid special pleading in those cases is to ask "how does the frequency with which <x> occurs compare to the known frequency of scribal errors in handwritten manuscripts?" Maybe with an added dash of "how often do transcribers disagree about the location of spaces?" thrown in for good measure.

Karl

RE: Some contrarian views on transcription - ReneZ - 28-04-2024

As far as I am concerned, everyone should use the transliteration alphabet they like.
For communication, having everyone understand a single 'lingua franca' is a great advantage.

We have the luxury (that Currier et al did not have) that we can replace the alphabet with simple
commands, scripts, or home-brewn tools. This makes the whole question far less critical.

When I do stats, I always replace 'weirdoes' by their nearest equivalent.

The point is: having them in the file give one the choice. Had they not been recorded, who knows if we would be missing something.
The 'ligature' type of weirdoes like ckhhh etc are a different situation, because these appear intentional, and neither Currier nor v101 can capture them.