(26-04-2022, 02:48 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.As Rene also noticed, however, the "rewriting n-grams" method has a significant drawback: it makes words really short. As you can see in the "Voynich- Vvovyvnvivcvhv" example above, verbose encoding has the effect of lengthening words, and Voynichese words aren't excessively long to begin with.
I've been beating this drum for the better part of two decades, but for a variety of reasons (probably including lack of formal publication) I'm not sure I've gotten much traction -- the existence of pairs of glyphs that occur straddling spaces with high frequency but occur within "words" with very low frequency and/or very limited contexts is pretty compelling evidence that a huge chunk of spaces were inserted mechanically according to some set of rules. (More precisely, it is evidence consistent with that theory, although I would argue that any theory about the text needs to address the general issue [and specific behavior I discuss below regarding Currier '9']). While "with very low frequency" may seem like weasel words, the issues of both scribal and transcriber error need to be kept in mind.
As an example (useful because I shouldn't have to translate Currier '9' and '4' into other transcription alphabets, plus it covers a huge fraction of spaces in both Herbal A and Bio B), look at the one piece of seriousness in my Apr. 1 posting (You are not allowed to view links.
Register or
Login to view.):
"Looking at the Herbal A 'language' pages in the D’Imperio transcription, 67% of the time a ‘9’ is followed by a space (1097 occurrences); 11% of the time it is followed by the end of a line (203 occurrences). It is *never* directly followed by ‘4’ without an intervening space -- the only glyphs that follow it within a "word" more than a single-digit number of times are ‘F’ (103), ‘P’ (92), ‘8’ (52), ‘S’ (51), and ‘B’ (16) (‘Z’ just misses the cutoff at 8 occurrences)."
*22%* of spaces in BioB are straddled by '9' and '4' -- 1266 occurrences -- with just 10 occurrences of "94" inside a word. While "9 4" specifically is less common in Herbal A (just 4% of spaces), '9' is still the character before a space 32% of the time and the line/paragraph final character 34% of the time....
I cringe every time I read a paper that takes for granted that spaces are word separators without explicitly foregrounding that assumption. I cringe even more when I read a paper that throws Herbal A and Bio B folios into the same statistical meat grinder without recognizing/acknowledging the magnitude of the difference in their statistics.
And, yeah, I know, I should be more up to date in the transcription I use for casual analysis, especially since I have scripts to convert both EVA and V101 into Currier, although in this specific case there are bigger issues with transcription problems. Comparing v101 and EVA (put into the common transcription alphabet of your choice) on f1r-f57r & f99r-f102v1 -- purely because that happened to be a chunk of folios I was looking at for another reason, and recognizing there may be differences between sections and/or scribes:
Looking at the 9753 places where either transcriber saw a full space:
* 94% of the time, both saw a full space
* 4.5% of the time, one saw a half space
* 1.5% of the time, one saw no space
Looking at the 1178 places where either transcriber saw a half space:
* 17% of the time, both saw a half space
* 38% of the time, one saw a full space
* 45% of the time, one saw no space
In addition, v101 is more likely to have a full space rather than a half space, or a full or half space rather than no space, than the other way around. So...consistency of transcriber judgements regarding where spaces are and aren't isn't necessarily great. To paraphrase an old saying about clocks, a Voynich researcher who has one transcription knows what it says; a Voynich researcher who has multiple transcriptions is never sure.
By the way, I am *very* disappointed my Blackadder theory didn't get a higher profile :-(. Where were *my* 15 minutes of fame? Where were the initially credulous articles in Wired UK and Ars Technica, followed by walk backs where they quote Lisa Fagin Davis or Kevin Knight saying something along the lines of, "What do I think? Um...Let me put it this way -- have you ever driven through farm country during a heat wave on a muggy, sunny day..."?