The Voynich Ninja

Pages: 1 2 3 4

I am still working on the mapping problem, or at least thinking about it, without any significant success.
Here is something I noticed and that could maybe provide the possibility to make some progress. It is possible that this has been discussed before, but on the other hand it is also possible that I made errors and these observations are not correct.

[attachment=4700]

This histogram shows the different variants of ai*n as a percentage of words in the different sections (based on the Zandbergen-Landini transliteration ZL_ivtff_1c.txt ignoring uncertain spaces). Sections are sorted from Strongly-Currier-A to strongly-Currier-B, as discussed by Rene in You are not allowed to view links. Register or Login to view..
Observations:

an and aiiin are marginal and will not be discussed.
ain shows a rather clear ascending trend
aiin shows a less clear descending trend; the most extremely B section (Bio aka Q13) is the only one in which ain is more frequent than aiin
the cumulative ai*n shows that the different sections are relatively consistent with respect to these totals, with the exception of the Stars/Q20 section which has more occurrences of these patterns.

Since Herbal-A and Herbal-B have similar totals, one could speculate that the variability in the overall ai*n depends on the subject of the text. Anyway, the totals do not seem correlated with A vs B.

The % of 'aiin' with respect to the total of ai*n-words has a different behaviour. It is very consistent for Herbal-A, Pharma (aka Small Plants) and Astro-Cosmo-Zodiac. Then it progressively drops through the other sections, forming a smooth curve.
[attachment=4699]
This curve appears to depend on the degree of Currier A-ness or B-ness of each section. It could be that, whatever the function of 'aiin', in Currier-B it is sometimes fulfilled by 'ain' instead; the more a dialect is skewed towards B, the more this replacement takes place. This is if course very speculative and uncertain.

Anyway Q20 and Q13 are both large and the ratio between 'ain' and 'aiin' in the two is about 4 to 6, with a total of almost 3000 tokens. Maybe there are enough cases to compare the occurrences of the two sequences and see if there are specific situations where one rather than the other occurs. I am not optimistic about the possibility of finding anything, but it could be worth trying.

Of course, a similar analysis could be attempted on Herbal-A+Pharma+AstroCZ, but here the tokens are fewer (about 2000) and 'ain' is relatively rare, so I expect that the task would be even more challenging.

I focussed about mapping ain/aiin in between Stars/Q20 and Bio/Q13 and here are the results. As always, it is possible I made errors somewhere.

I will start with some general thoughts about the mapping problem. These were partially inspired by what Nablator wrote at the end of You are not allowed to view links. Register or Login to view..

Let's say we are trying to map patterns X and W between two dialects D1 and D2. Let 'c' and 'g' represent some kind of context.
A simple direct mapping is possible if you have something like this:

D1: cX cX cX cX cX gX gX gX gX gX
D2: cX cX cX cX cX gW gW gW gW gW

'W' only occurs in D2 in context 'g'. Wou can map D1 into D2 with this replacement:
gX -> gW
Or you can map D2 into D1:
gW -> gX
I think that in principle the first option is preferable, since it preserves W that in the second option is totally replaced by X.

In the Voynich manuscript, there are very few "hard rules" and almost everything seems to happen by "preferences". The situation you tend to have is more like:

D1: cX cX cX cX cW gX gX gX gW gW
D2: cX cX cX cW cW gX gW gW gW gW

Since there is no context that clearly correlates with only one of the two patterns, all replacements are quite far from perfect. Since there are more gW than gX, we might still decide to apply:
gX -> gW
But since both gX and gW appear in both D1 and D2, how do we know that the difference is not significant?
Treating X and W as equivalent will perfectly match D1 and D2:
X -> W
or
W -> X
But even more information is lost. Again, we are unable to tell when X or W should be used, but we are not sure that the difference between X and W is not meanigful.

As you can guess, the above describes what I found about ain/aiin.
The following histograms show the most frequent couple of ain/aiin words in the two sections Stars/Q20 and Bio/Q13. I used the Zanbergen-Landini EVA transliteration (ZL_ivtff_1c) ignoring uncertain spaces. Benches where replaced by C,S and benched gallows by K,T,P,F. As can be seen, there are differences between the two sections, but there also is a great overlap between words. In both sections, both variants of each couple are significantly present: there is no word that can be written as Xain but cannot be written as Xaiin and vice-versa.

[attachment=4739]

A first information is that sections and word stem do not define a clear context for choosing 'ain' over 'aiin'.
Something I found interesting in Bio/Q13 is that:

All the frequent words that include a gallows (t,k) prefer 'ain' over 'aiin'
All the frequent words that prefer 'ain' over 'aiin' include a gallows (t,k)

Since we are talking about seven different word couples, this does not seem likely to be random. But very clearly it is just a preference and I don't know how to use it to create a mapping.

I then looked at the immediately preceding and following words for each couple Xain Xaiin. Since we are now considering two consecutive words, numbers become much smaller (repeating word sequences are not very frequent in the VMS).

These are the data I collected for dain / daiin.

[attachment=4737]

Here it seems possible to state a "hard rule".

dain never occurs after a word ending by -chedy

If I check the whole transliteration file, I find 14 occurrences of '-chedy.daiin.'
But there appears to be an exception chedy.dain in Herbal B (f43r.8) and another case was excluded from my analysis for a dubious space after dain (f108v.36).
Anyway, even if this was a true hard rule, it does not help matching the Stars and Bio sections, since the rule already applies to both sections.

The graphs for qokain/qokaiin are even less informative. In most cases, both combinations are possible, e.g. shedy.qokain shedy.qokaiin.

[attachment=4738]

Being unable to find a specific context that tells me when 'ain' should replace 'aiin', or vice-versa, the only solution I can think of is assuming that 'ain' and 'aiin' are equivalent and apply something like:
ain->aiin
I guess this would further reduce the entropy of the text (since one can always guess that the first 'i' will be followed by a second 'i').
More importantly, I am afraid that finding specific hard-rules that can allow mapping a dialect into another will rarely or never succeed and one has to resort to 'flattening' the text as I described above. So the mapping would basically result in a massive loss of potentially significant information.
But maybe I am being too pessimistic. Suggestions for specific features that look mappable are welcome.

(04-09-2020, 03:30 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.But maybe I am being too pessimistic. Suggestions for specific features that look mappable are welcome.

The inescapable conclusion for me, even without trying, is that mapping from Currier-A to Currier-B is impossible, because of the fluid nature of Voynichese, continuously (and sometimes discontinuously) drifting. This evolution cannot be ascribed to a set of rigid rules. As you said: no hard rules only preferences.

This is an interesting analysis, thanks Marco. When trying to map between A and B, I often feel as though there's some kind of big trick we're missing, not necessarily a small trick.

Pages: 1 2 3 4

MarcoP

MarcoP

nablator

nickpelling