The Voynich Ninja
Mapping between Currier A and B - Printable Version

+- The Voynich Ninja (https://www.voynich.ninja)
+-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html)
+--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html)
+--- Thread: Mapping between Currier A and B (/thread-3016.html)

Pages: 1 2 3 4


RE: Mapping between Currier A and B - ReneZ - 14-12-2019

Interesting Marco,

I am not aware that this has been tried in any serious manner before.

I wonder about your distance measure. I don't know what is "right" of course, but one can do several things.
My inclination would be to look at bigram distribution of the result.

There is a statistical advantage that there are a lot more samples and fewer different items, so the result would be numerically more significant.

The other advantage is a bit more hypothetical.
The difference between A and B can be more about 'rules' on the 'dialect' or it can be more about a different vocabulary (different subject matter).
Looking at bigrams will concentrate more on the former and, assuming that such rules exist, will be less affected by changing vocabulary or subject matter.

From my experimentation with alternative HMM I found that measuring the distance between bigram distributions works better with the Bhattacharyya distance than just RSS-ing the frequencies, but this may just be fine-tuning.


RE: Mapping between Currier A and B - MarcoP - 15-12-2019

Thank you for your comments, Nablator and Rene!
I will follow your suggestions and, as a next experiment, I will try multi-rule hill-climbing minimizing Bhattacharyya distance on bigram distributions.
This will likely take a while both to write the code and to run it (evaluating each step is computationally expensive and, as Nablator said, the search-space is huge).


RE: Mapping between Currier A and B - ReneZ - 15-12-2019

Hi Marco,

I am wondering. Is the choice of the substitution to be tested something that is a user input, or is this decided (in your setup) by an algorithm?

In a one-pass approach, as you did before, this can be user input, but if it has to be iterative or progressive, this will not really be possible anymore. It significantly changes the entire approach...


RE: Mapping between Currier A and B - MarcoP - 15-12-2019

(15-12-2019, 01:23 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Hi Marco,

I am wondering. Is the choice of the substitution to be tested something that is a user input, or is this decided (in your setup) by an algorithm?

In a one-pass approach, as you did before, this can be user input, but if it has to be iterative or progressive, this will not really be possible anymore. It significantly changes the entire approach...

Hi Rene,
my first experiments already used an algorithm to propose candidate substitutions: this was extremely simple - all the combinations of the the most common 100 sequences from each text where tried, selecting the most effective ones. 
Of course, here I will have to think of something both more extensive (the top 100 sequences might not include all relevant possibilities) and more flexible (so that iterations can alter the current best set, as described by Nablator). I think that possibly transforming B into A might be somehow easier, since it is more tolerable for B words to disappear in a mapping to A than vice-versa (the problem with or->edy mentioned by Nablator).
As I said, I understand how difficult the task is and I don't expect to obtain any miracle, but maybe we can spot something of interest even with relatively simple experiments.


RE: Mapping between Currier A and B - MarcoP - 19-12-2019

I have run a few hill-climbing experiments, searching for B->A replacements resulting in a minimal bigram-frequency Bhattacharyya difference.
It seems that each run arrives at a different solution: this could be because the problem is not fit for hill-climbing, or because my code is not adequate. My impression is that both things are true.

These are the results for searches constrained to use strings 3-characters long or shorter and solution including at most three replacement rules (I list two experiments for each transliteration system considered). There is a certain agreement about the ed->od replacement (Cuva:ED->OD, Currier:C8->O8, it occurs in all results, but the second EVA experiment).

eva     [['oka'->'o'] ['hdy'->'hor'] ['ed'->'od']]
eva     [['oka'->'oda'] ['.l'->'.do'] ['edy'->'or']]
cuva    [['UDY'->'UO'] ['ED'->'OD'] ['OKU'->'OTS']]
cuva    [['OD'->'OTS'] ['UD'->'SOR'] ['ED'->'OD']]
currier [['.E'->'.8O'] ['CC8'->'COR'] ['C8'->'O8']]
currier [['OR'->'SO'] ['C8'->'O8'] ['O89'->'OR']]



All experiments were run for more than 10,000 iterations and, when I stopped them, had been unable to improve the current solution for a few thousand iterations. Typically, a solution reduces the difference between A and B to about half the difference between the two unmodified subsets of the text.

Here is an example (f104r.43) of the substitutions in the first EVA solution listed above:

ORIGINAL: tsheodl.qokaiin.qokchedy.ykchdy.pchedy.qokeedy.oteey.qokain.oteoldal
MODIFIED: tsheodl.qoiin.qokchody.ykchor.pchody.qokeody.oteey.qoin.oteoldal

Two of the newly generated words (qokchody and ykchor) appear in A and not in B: fine.
pchody and qokeody appear both in A and B: also OK.

But qoiin and qoin only appear in B and are extremely rare: they cannot be an acceptable A equivalent for the very common qokaiin and qokain.


RE: Mapping between Currier A and B - ReneZ - 20-12-2019

Hi Marco,

what you have achieved in such a short time is impressive.
I don't yet know how to interpret the results...


RE: Mapping between Currier A and B - MarcoP - 21-03-2020

Recently You are not allowed to view links. Register or Login to view. confirmed the presence of different hands in the VMS and pointed out that some correlation exists between hands and "dialects" i.e. different sub-groups of the two larger classes known as Currier A and B.

Also, the joint efforts of You are not allowed to view links. Register or Login to view. showed that variation in the distribution of the two benches ch sh appears to be due to the differences between the various image-identified sections, rather than to differences between single pages.

These recent findings suggest that it might be better to try and match single dialects, rather than the bulk of Currier A and Currier B. Until we have a complete classification of the ms pages according to the different hands, I had a look at differences based on the image-defined sections. I used paragraph text from the Zandbergen-Landini EVA transliteration, ignoring dubious spaces. The difference measure is the same I used in the experiments described above: Bhattacharyya distance between digraph histograms (including the space between words and line-break as two distinct characters).

This is what I found (as always, I may have made errors in the process):

__      HerbalA Pharma  AstroCZ HerbalB StarsQ20  BioQ13
chars     45505   13769   10909   20197   67078   40333
 
HerbalA       0  0.0493  0.0646  0.0790  0.0955  0.1308
Pharma   0.0493       0  0.0585  0.0675  0.0664  0.0936
AstroCZ  0.0646  0.0585       0  0.0247  0.0264  0.0626
HerbalB  0.0790  0.0675  0.0247       0  0.0277  0.0481
StarsQ20 0.0955  0.0664  0.0264  0.0277       0  0.0317
BioQ13   0.1308  0.0936  0.0626  0.0481  0.0317       0
 
average  0.0699  0.0559  0.0395  0.0412  0.0413  0.0611


AstroCZ groups together all pages classified as Astrological, Cosmological or Zodiac (but I think that the Zodiac actually does not contribute any paragraph). The sections are sorted according to what Rene discusses at the end of You are not allowed to view links. Register or Login to view., in which he showed the significant correlation between dialects and subject of the illustrations.

The table above does not add much to what can be deduced from Rene's graphs: the HerbalA and BioQ13 sections are at the opposite extremes of the range. The difference between the two (0.13) is as large as the difference between Latin (Virgil's Aeneid) and Italian (Dante's Commedia) - well distinct but related languages.
On the other hand, the three intermediate sections AstroCZ, HerbalB, StarsQ20 are all similar, with differences in the 0.025-0.028 range, just slightly more than the difference between Virigil's classical Latin poetry and Mattioli's early-modern Latin prose (0.022).

Currently, I think that a possible approach to map the whole VMS text to a uniform language could be choosing one of the central sections (e.g. Q20), or maybe the concatenation of the three, as "canonical" and mapping the other sections into that structure. In this way, HerbalA and Q13 would converge towards the canonical language from opposite directions and would meet mid-way, without the need of too many changes in any subset.


RE: Mapping between Currier A and B - Emma May Smith - 21-03-2020

When you say that certain sections are closer than others in "dialect",  is this due to the strength of certain features or their presence/absence? Can we map individual features across the five dialects in a spectrum-like way?


RE: Mapping between Currier A and B - MarcoP - 21-03-2020

(21-03-2020, 05:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.When you say that certain sections are closer than others in "dialect",  is this due to the strength of certain features or their presence/absence?
 

Hi Emma,
my impression is that differences are mostly due to strength, rather than some features being totally absent in some sections. This can be seen in the graphs at the bottom of Rene's page: 'ed' (ed) appears to be the only feature, among those analysed, that is (almost) totally absent in some sections; 'so' (Cho) is also very rare in some sections, and 'ho' (qo) less clearly so. But this looks like something we could look into in more detail.

(21-03-2020, 05:29 PM)Emma May Smith Wrote: You are not allowed to view links. Register or Login to view.Can we map individual features across the five dialects in a spectrum-like way?

Could you please provide an example of the kind of spectrum-like system you are thinking of? I will do my best to extract the needed data and show them in a similar format.


RE: Mapping between Currier A and B - MarcoP - 24-03-2020

Here are the links to Julian Bunn's 2013 posts about the subject of this thread (thanks to Emma for bringing them to my attention).

You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.

I think this set of posts is one of the most extensive tackling of the problem: they are certainly worth reading for people interested in this line of research. Julian makes use of the "101" Voynich transliteration system, that he "tweaked" in order to fit it to his approach to language mapping.

If I understand correctly, his approach is similar to what I did in You are not allowed to view links. Register or Login to view.: searching for a mapping that minimizes word-frequency differences (I later switched to minimizing bigram frequencies). Bu apparently Julian only/mostly considered mapping a single character to another single character (where single characters in his modified-101 system sometimes correspond to more than one EVA character). I considered mapping between slightly longer character sequences. Also, I included the word-break character among the possibilities, while Julian's mapping only maps each single source word into exactly one target word. I believe that allowing for variability in word-spacing is important, for instance to account for some of Currier's observations: -‘Unattached’ finals scattered throughout Language ‘B’ texts in considerable profusion; generally much less noticeable in Language ‘A.’-

For those who (like myself) are not familiar with 101, this is the conversion table from the March 5 post, rendered with the EVA system and EVA font:
   

Julian Wrote:In all the tests I ran, there were some common features in the results:
Mixing between “e” and “y” [i.e. l and r]– when writing Language A, the use of “e” appears to be equivalent to the use of  “y” in Language B, and vice versa
Mixing between  8,f,F,k,K,g,G,r,R,?  and so on – the Gallows glyphs swap amongst themselves, and “8” [EVA:d]

These results point out one of the many difficulties of the task: some glyphs (likel and r, or the whole gallows set) are highly interchangeable.