RE: ARTIFACTS IN THE TEXT.
-JKP- > 10-06-2017, 12:07 AM
I feel quite strongly (as mentioned in my blog), that flexibility of searching a transcript is crucial.
Putting different shapes into different font slots complicates this process even though it's more intuitively natural to construct a font-set that way. I lean away from doing it this way. There are other ways, within font sets, to express glyph variants that don't require a lot of slots and different keystrokes (and a good memory) to type all the variants.
Variations can be marked in a number of ways, so this is just an example, not necessarily the best way, but it's the easiest way to explain it.
If one suspects that there are two versions of a character, but one is not sure, then expressing them as different letters in a font table makes it necessary to set up two searches in order to catch them all when treating them as a single character. This isn't terribly difficult with grep but not everyone uses it, some use word-processor apps to search and not every app has a flexible search function.
If they are distinguished by a number, symbol or other method (in much the same way as the Z was added to the benched gallows in the Takahashi transcription), then suppose one uses y1 to represent a curved EVA-y and y2 to represent straight EVA-y, then there is no difficulty in searching them separately and together. Now, if one is doing character counts and word-length counts, one has to once again compensate for extra markers in the text, but there are a number of solutions that are possible.
I actually have three versions of my transcript because I was trying to work out some of these issues to try to find an optimum solution. In one of them, I have constructed the bench characters out of two separate glyphs, in another I used the character "n" to represent one and "ñ" to represent the one with a cap (characters like c and ç can also be constructed this way and can be alternately set up as requiring one keystroke or two). Thus, a two-step character, in the way one constructs a font that has characters with accents, is another way to approach it and, for some kinds of searches and statistical examinations, it works very well. Once you have a base transcription, creating these variants is not difficult and there will always be certain kinds of searches that work better in one system than another.