25-08-2020, 11:29 AM
I am still working on the mapping problem, or at least thinking about it, without any significant success.
Here is something I noticed and that could maybe provide the possibility to make some progress. It is possible that this has been discussed before, but on the other hand it is also possible that I made errors and these observations are not correct.
[attachment=4700]
This histogram shows the different variants of ai*n as a percentage of words in the different sections (based on the Zandbergen-Landini transliteration ZL_ivtff_1c.txt ignoring uncertain spaces). Sections are sorted from Strongly-Currier-A to strongly-Currier-B, as discussed by Rene in You are not allowed to view links. Register or Login to view..
Observations:
The % of 'aiin' with respect to the total of ai*n-words has a different behaviour. It is very consistent for Herbal-A, Pharma (aka Small Plants) and Astro-Cosmo-Zodiac. Then it progressively drops through the other sections, forming a smooth curve.
[attachment=4699]
This curve appears to depend on the degree of Currier A-ness or B-ness of each section. It could be that, whatever the function of 'aiin', in Currier-B it is sometimes fulfilled by 'ain' instead; the more a dialect is skewed towards B, the more this replacement takes place. This is if course very speculative and uncertain.
Anyway Q20 and Q13 are both large and the ratio between 'ain' and 'aiin' in the two is about 4 to 6, with a total of almost 3000 tokens. Maybe there are enough cases to compare the occurrences of the two sequences and see if there are specific situations where one rather than the other occurs. I am not optimistic about the possibility of finding anything, but it could be worth trying.
Of course, a similar analysis could be attempted on Herbal-A+Pharma+AstroCZ, but here the tokens are fewer (about 2000) and 'ain' is relatively rare, so I expect that the task would be even more challenging.
Here is something I noticed and that could maybe provide the possibility to make some progress. It is possible that this has been discussed before, but on the other hand it is also possible that I made errors and these observations are not correct.
[attachment=4700]
This histogram shows the different variants of ai*n as a percentage of words in the different sections (based on the Zandbergen-Landini transliteration ZL_ivtff_1c.txt ignoring uncertain spaces). Sections are sorted from Strongly-Currier-A to strongly-Currier-B, as discussed by Rene in You are not allowed to view links. Register or Login to view..
Observations:
- an and aiiin are marginal and will not be discussed.
- ain shows a rather clear ascending trend
- aiin shows a less clear descending trend; the most extremely B section (Bio aka Q13) is the only one in which ain is more frequent than aiin
- the cumulative ai*n shows that the different sections are relatively consistent with respect to these totals, with the exception of the Stars/Q20 section which has more occurrences of these patterns.
The % of 'aiin' with respect to the total of ai*n-words has a different behaviour. It is very consistent for Herbal-A, Pharma (aka Small Plants) and Astro-Cosmo-Zodiac. Then it progressively drops through the other sections, forming a smooth curve.
[attachment=4699]
This curve appears to depend on the degree of Currier A-ness or B-ness of each section. It could be that, whatever the function of 'aiin', in Currier-B it is sometimes fulfilled by 'ain' instead; the more a dialect is skewed towards B, the more this replacement takes place. This is if course very speculative and uncertain.
Anyway Q20 and Q13 are both large and the ratio between 'ain' and 'aiin' in the two is about 4 to 6, with a total of almost 3000 tokens. Maybe there are enough cases to compare the occurrences of the two sequences and see if there are specific situations where one rather than the other occurs. I am not optimistic about the possibility of finding anything, but it could be worth trying.
Of course, a similar analysis could be attempted on Herbal-A+Pharma+AstroCZ, but here the tokens are fewer (about 2000) and 'ain' is relatively rare, so I expect that the task would be even more challenging.