I focussed about mapping ain/aiin in between Stars/Q20 and Bio/Q13 and here are the results. As always, it is possible I made errors somewhere.
I will start with some general thoughts about the mapping problem. These were partially inspired by what Nablator wrote at the end of You are not allowed to view links.
Register or
Login to view..
Let's say we are trying to map patterns X and W between two dialects D1 and D2. Let 'c' and 'g' represent some kind of context.
A simple direct mapping is possible if you have something like this:
D1: cX cX cX cX cX gX gX gX gX gX
D2: cX cX cX cX cX gW gW gW gW gW
'W' only occurs in D2 in context 'g'. Wou can map D1 into D2 with this replacement:
gX -> gW
Or you can map D2 into D1:
gW -> gX
I think that in principle the first option is preferable, since it preserves W that in the second option is totally replaced by X.
In the Voynich manuscript, there are very few "hard rules" and almost everything seems to happen by "preferences". The situation you tend to have is more like:
D1: cX cX cX cX cW gX gX gX gW gW
D2: cX cX cX cW cW gX gW gW gW gW
Since there is no context that clearly correlates with only one of the two patterns, all replacements are quite far from perfect. Since there are more gW than gX, we might still decide to apply:
gX -> gW
But since both gX and gW appear in both D1 and D2, how do we know that the difference is not significant?
Treating X and W as equivalent will perfectly match D1 and D2:
X -> W
or
W -> X
But even more information is lost. Again, we are unable to tell when X or W should be used, but we are not sure that the difference between X and W is not meanigful.
As you can guess, the above describes what I found about ain/aiin.
The following histograms show the most frequent couple of ain/aiin words in the two sections Stars/Q20 and Bio/Q13. I used the Zanbergen-Landini EVA transliteration (ZL_ivtff_1c) ignoring uncertain spaces. Benches where replaced by C,S and benched gallows by K,T,P,F. As can be seen, there are differences between the two sections, but there also is a great overlap between words. In both sections, both variants of each couple are significantly present: there is no word that can be written as Xain but cannot be written as Xaiin and vice-versa.
A first information is that sections and word stem do not define a clear context for choosing 'ain' over 'aiin'.
Something I found interesting in Bio/Q13 is that:
- All the frequent words that include a gallows (t,k) prefer 'ain' over 'aiin'
- All the frequent words that prefer 'ain' over 'aiin' include a gallows (t,k)
Since we are talking about seven different word couples, this does not seem likely to be random. But very clearly it is just a preference and I don't know how to use it to create a mapping.
I then looked at the immediately preceding and following words for each couple Xain Xaiin. Since we are now considering two consecutive words, numbers become much smaller (repeating word sequences are not very frequent in the VMS).
These are the data I collected for dain / daiin.
Here it seems possible to state a "hard rule".
- dain never occurs after a word ending by -chedy
If I check the whole transliteration file, I find 14 occurrences of '-chedy.daiin.'
But there appears to be an exception chedy.dain in Herbal B (f43r.8) and another case was excluded from my analysis for a dubious space after dain (f108v.36).
Anyway, even if this was a true hard rule, it does not help matching the Stars and Bio sections, since the rule already applies to both sections.
The graphs for qokain/qokaiin are even less informative. In most cases, both combinations are possible, e.g. shedy.qokain shedy.qokaiin.
Being unable to find a specific context that tells me when 'ain' should replace 'aiin', or vice-versa, the only solution I can think of is assuming that 'ain' and 'aiin' are equivalent and apply something like:
ain->aiin
I guess this would further reduce the entropy of the text (since one can always guess that the first 'i' will be followed by a second 'i').
More importantly, I am afraid that finding specific hard-rules that can allow mapping a dialect into another will rarely or never succeed and one has to resort to 'flattening' the text as I described above. So the mapping would basically result in a massive loss of potentially significant information.
But maybe I am being too pessimistic. Suggestions for specific features that look mappable are welcome.