Mapping between Currier A and B - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Mapping between Currier A and B (/thread-3016.html) |
RE: Mapping between Currier A and B - Koen G - 09-12-2019 I was thinking about this in the shower (as one does), and I wondered: both Currier languages generally correspond to two hands, right? It doesn't really matter whether you think these are two scribes or one scribe at different times, there is a clear divide between both hands. But there is less of a clear divide between both "languages". It is clear that one hand does different things than the other, but how do they behave in the in between pages? Marco pointed to some pages that appear 50/50, which hand is those? Do both hands stray or is one "pure" and the other variable? If we can understand this, we may be able to decide which pages can best be compared to each other for the purpose of this thread? RE: Mapping between Currier A and B - Torsten - 10-12-2019 (09-12-2019, 02:26 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view. Assuming I have not made majors errors, one can see at least four different patterns:
If you would build lists for common initial glyphs like <d->, <s->, <o->, or <qo-> you would get similar results. The same is true for typical word final glyphs like <-in>, <-l>, <-r>, or <-y>. The following table lists the five most frequent <-in>-words for different sections: Herbal A daiin dain chaiin aiin otaiin Pharma A daiin aiin dain olaiin saiin Astro daiin aiin dain odaiin oteodaiin Cosmo aiin daiin qokaiin ytaiin ykaiin Herbal B aiin daiin okaiin qokaiin saiin Stars B aiin daiin qokaiin okaiin otaiin Biological B qokain qokaiin daiin dain okain The top words occur with the following frequencies: daiin dain aiin odaiin okaiin otaiin qokaiin qokain word count ------ ----- ------ ------- ------- ------- -------- ------ ----------- Herbal (A) 403 80 33 20 28 28 15 1 8,087 Pharma (A) 99 13 30 4 6 3 2 1 2,529 Astro 12 4 11 3 1 2,136 Cosmo 36 3 56 7 9 14 18 6 2,691 Herbal (B) 72 11 72 4 31 8 20 5 3,233 Stars (B) 122 53 193 17 94 74 114 105 10,673 Biological (B) 84 47 32 1 34 12 88 159 6,911 The two tables illustrate the usage of word pairs like <daiin/dain>, <daiin/aiin>, <daiin/odaiin>, <okaiin/otaiin> or <qokaiin/qokain>. The reason for this observation is that "all pages containing at least some lines of text do have in common that pairs of frequently used words with high mutual similarity appear. The exact co-occurrences may vary: there are pages where <daiin> is paired with <dain>, but also pages where it is frequently used together with <aiin> (f41v, f46r, f55v, f89v2, v105v, and f114r) or <saiin> (f2r, f16r, and f90r2)" (You are not allowed to view links. Register or Login to view., p. 3). Or to say it with Renés words: "A given word pattern is just about as likely to start with <o-> as with <qo->, or with <ch-> vs. <sh->, or contain <k> vs. <t>, or <p> vs <f>, or end with <-y> vs. <-dy> or <-in> vs. <-iin>" (You are not allowed to view links. Register or Login to view.). RE: Mapping between Currier A and B - MarcoP - 10-12-2019 Thanks everybody! Enough ideas have been mentioned to keep me busy for months (09-12-2019, 03:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Another point I had been wondering about is whether the B language could be seen as A language with additional words. The fact that B-language pages tend to have much more text than A-language pages could be just an effect of this 'adding words'. Hi Rene, the fact that Currier B adds new word types to Currier A is certainly central to the whole phenomenon. I don't think that it can explain everything, since there are fluctuations in the frequencies of words that appear everywhere that must have another explanation. For instance the occurrences of chol/cheey vary from 192/15=12.8 in Herbal_A to 12/36=0.3 in Bio. The frequency of chol in Herbal_A is 2.5%: comparable with the frequency of the most frequent word in English. Even if tokens belonging to the new B-types are frequent, I don't think they are enough to explain how a word drops from 2.5% to 0.2%. (09-12-2019, 06:52 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.An alternative to the verbose cipher would be a number theory. If the Voynich MS words are like a numbering or enumeration system, a similar progression could be expected. Just compare it with Roman numerals. D only starts appearing after 500 words and M only after 1000. You are not allowed to view links. Register or Login to view. you proposed is (together with Timm & Schinner) one of the very rare solid explanations for quasi-reduplication. But (as pointed out by Torsten) one does not expect new function words to appear after several thousands of words. Can 'chedy' be something like MI (1001)? With a progressive numeric system, MI would correspond to a word (or character, syllable, anything) that never appears in the first several pages of the text: certainly not a frequent item. The frequency of 'chedy' in Bio is close to that of the English conjunction 'and' or the article 'the', or an averagely frequent character like 'm': these items are everywhere, they do not appear after several pages. A similar argument applies to other frequent B-words. Here I compare the top 30 most frequent words in different sections of the VMS and Latin texts, spanning different subjects and a long time period. In Latin, there is a considerable intersection between the most frequent words (though there also are considerable differences). I only see these two options:
RE: Mapping between Currier A and B - ReneZ - 10-12-2019 Hi Marco, yes, the observation that new words all of a sudden become very frequent is something that requires a better explanation. This is one of those cases where it is not difficult to come up with quite a number of possibillities, especially taking into account combinations, but it is of course extremely difficult to judge how 'likely' any of these is. Some examples would be - Change of dialect by itself - Change of dialect in combination with some encoding - Change of scribe - Change of handwriting in a source document in combination with some encoding - Minor change of a rule in combination with some encoding - They are null words A numbering system in combination with a change of dialect could result in the observed effect. The strong impression I have is the the evolution in the direction A -> B seems easier to explain than in the other direction. RE: Mapping between Currier A and B - -JKP- - 10-12-2019 (10-12-2019, 04:18 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view. ... - They are null words ... If they are nulls, I'm not sure we can call them words. I can see this one "- Minor change of a rule in combination with some encoding " possibly being a major factor. RE: Mapping between Currier A and B - ReneZ - 10-12-2019 No problem JKP, but I just used that to distinguish them from the more standard use of the term null. One could imagine that the sequence ed is a null. This would normally mean that only these two characters are meaningless and one should read qokeedy as qokey Alternatively, this character pair could indicate a null word, in which case the entire word qokeedy should be ignored. Either one of the two options could bring B language closer to A language. I never tried. I think there is more to it, and the number of possibilities that one could test are rather large. RE: Mapping between Currier A and B - Torsten - 11-12-2019 (10-12-2019, 03:32 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.the fact that Currier B adds new word types to Currier A is certainly central to the whole phenomenon. I don't think that it can explain everything, since there are fluctuations in the frequencies of words that appear everywhere that must have another explanation. Indeed, a token dominating a paragraph, page or section might be rare or missing on the next one. An interesting example for this pattern is the usage of <You are not allowed to view links. Register or Login to view.> (see You are not allowed to view links. Register or Login to view.). Jorg Stolfi has described the usage of <qokeey> within the stars section in terms of three distributions: "If a paragraph contains the word <qokeey>, there is a 38% chance that the next paragraph will contain the word <qokeey>. If a paragraph contains the word <qokeey>, there is a 40% chance that <qokeey> occurs more than once. If the current word is <qokeey>, there is a 6% chance that the next word will be <qokeey>" (You are not allowed to view links. Register or Login to view.). The general rule behind this pattern is that "high-frequency tokens also tend to have high numbers of similar words." (Timm & Schinner 2019, p. 6). With other words <You are not allowed to view links. Register or Login to view.> occurs preferably in close vicinity of tokens with high structural similarity (see Timm & Schinner 2019, p. 3): okey okeey qokeey qokeedy word count ------ ------ ------- -------- ----------- Herbal (A) 4 7 12 8,087 Pharma (A) 11 24 21 2,529 Astro 4 6 4 2,136 Cosmo 1 6 6 4 2,691 Herbal (B) 6 8 9 9 3,233 Stars (B) 16 96 159 137 10,673 Biological (B) 12 19 89 153 6,911 RE: Mapping between Currier A and B - nablator - 11-12-2019 (09-12-2019, 03:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Another point I had been wondering about is whether the B language could be seen as A language with additional words. I have been wondering if EVA-lk, the second-best discriminator bigram for A/B after EVA-ed, should be split with a space. It looks like omitting the space becomes increasingly popular in Currier-B, which explains the increasing frequency of lk... maybe. RE: Mapping between Currier A and B - MarcoP - 13-12-2019 I have run a first set of experiments, according to the simple approach described in the first post. In particular:
These are the top 20 results for the three systems. There clearly is a lot of redundancy and most results convert various A suffixes into B:-edy. The reason for this is quite obvious. It is possibly less obvious that or -> edy. appears to be more effective than or. -> edy. The difference between the two is that the first one can break an A word e.g. dorchaiin -> dedy.chaiin where (in this case) both resulting words appear in B. The second rule only applies to the subset of the scope where -or is word final. Another substitution I did not expect is: y.d -> dy. this results in rewrites like: shey.dair -> shedy.air chy.dam -> chdy.am which indeed transform A-word-sequences into valid B-words. There could be other substitutions worth commenting upon, but my main interest now is understanding how to proceed. A possibility could be re-formulating this search into something like "simulated annealing", rather than this simple brute-force approach. Also, a distance measure that considers the overlap between word-sequences (instead of word frequencies alone) is something I am curious about. RE: Mapping between Currier A and B - nablator - 13-12-2019 (13-12-2019, 11:38 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.There could be other substitutions worth commenting upon, but my main interest now is understanding how to proceed. A possibility could be re-formulating this search into something like "simulated annealing", rather than this simple brute-force approach.You need to investigate multiple substitutions at the same time. One substitution at a time may not be enough to detect an improvement in your metric. This of course makes the search space huge. I would try to "hill climb" first. Some problems are well suited to the hill climbing algorithm (just swap the target part of two randomly selected rules or enable a rule and disable the other and retest: if the result is worse, backtrack). If you are out of luck you get a different sub-optimal "solution" each time, and you need a better algorithm. To reduce the set of possible sets of rules to apply together, this can be taken in consideration: a rule like "or -> edy" cannot possibly be right without another rule something -> "or" because "or" must not be eliminated from Currier-B. So you need a set of rules in cycles: pattern 1 -> pattern 2 -> pattern 3 -> ... -> pattern n -> pattern 1. |