09-12-2019, 02:26 PM
This is a subject that has been put forward by Nick Pelling, for instance You are not allowed to view links. Register or Login to view..
I understand that the subject is extremely complex and I doubt I can contribute much. But I think that Nick has described a promising area for further research and it could be interesting to discuss ideas and possible approaches, even if there is not much hope that we can make serious progress.
My admittedly superficial take to the problem would be to see it as some kind of optimization: find the set of N rewrite rules converting A into B (or vice-versa) so that some measure of the difference between A and B is minimized.
Even this simplistic approach poses a few questions e.g.:
* how to represent Voynchese? (as a first step, I would just experiment with a few different transliteration systems, e.g. EVA, Cuva, Currier)
* how many rewrite rules should be defined? (this is another area where one can experiment with different values for N)
* should one map A into B or vice-versa?
* is it better to compare the whole of A vs the whole of B, or to just consider the more "extreme" sections, e.g. mapping HerbalA into Bio? what to do with the intermediate Astro / Cosmo / Zodiac sections?
* how to measure the difference to be minimized? bigram histograms? word histograms? frequency of repeating word-combinations (which could address the daiin/qokedy issue mentioned by Nick)?
Torsten recently You are not allowed to view links. Register or Login to view. a table of words that seems to me a way to get some "feel" for what is going on. His table "lists the four most frequent 'ch/sh'-words for different sections". He describes the phenomenon as "the shift from 'chol/chor' via 'cheol/cheor', 'cheo/sheo', 'chey/shey' to 'chedy/shedy'".
I expanded on the idea, focussing on ch-words only and extracting the 30 most frequent word types in each section. I used the Zandbergen-Ladini transcription, ignoring uncertain spaces and text-only pages; I joined Astro / Cosmo / Zodiac pages into a single section. Sections are sorted from "strongly-A" to "strongly-B", as discussed by Rene at the end of You are not allowed to view links. Register or Login to view.. For each word, I include the % of occurrences in each section.
[attachment=3774]
Assuming I have not made majors errors, one can see at least four different patterns:
NickPelling Wrote:Koen: I’ve been saying for some time that I think the next big “step up” in Voynichese study will come when some clever person finds a way to map between A patterns and B patterns, i.e. to normalize the two (errrm… actually several) parts into a single thing.
But to do this properly, you need to parse A and B, build letter contact tables for them, and then build state machine ‘grammars’ that capture how the two behave – the stuff that’s the same is probably the same, but the stuff that’s different probably involves something that was written as XXX in A being written as YYY in B. Normalizing A/B would involve being able to say “XXX == YYY”. However, this rests on the back of parsing, letter contact tables, and state machines, which (I think) steganographica tricks are disrupting. So I’m still not at all sure how we get over all the technical hurdles to get to a state where we can approach this in a rigorous enough way.
But perhaps some of these XXX == YYY equivalences can be worked out even without all that machinery. For example, I have long strongly wondered whether daiin daiin patterns in A reappear (in some way) as qotedy qokedy patterns in B. Clearly, both involve repetitive “bla-bla-bla” word sequences that are hard to reconcile with either linguistic readings or crypto theories. And given that I’ve previously speculated whether daiin daiin might be enciphering Arab numerals, it would be logical for me to speculate whether qotedy qokedy might be doing the same (but in a different way). Just a thought.
I understand that the subject is extremely complex and I doubt I can contribute much. But I think that Nick has described a promising area for further research and it could be interesting to discuss ideas and possible approaches, even if there is not much hope that we can make serious progress.
My admittedly superficial take to the problem would be to see it as some kind of optimization: find the set of N rewrite rules converting A into B (or vice-versa) so that some measure of the difference between A and B is minimized.
Even this simplistic approach poses a few questions e.g.:
* how to represent Voynchese? (as a first step, I would just experiment with a few different transliteration systems, e.g. EVA, Cuva, Currier)
* how many rewrite rules should be defined? (this is another area where one can experiment with different values for N)
* should one map A into B or vice-versa?
* is it better to compare the whole of A vs the whole of B, or to just consider the more "extreme" sections, e.g. mapping HerbalA into Bio? what to do with the intermediate Astro / Cosmo / Zodiac sections?
* how to measure the difference to be minimized? bigram histograms? word histograms? frequency of repeating word-combinations (which could address the daiin/qokedy issue mentioned by Nick)?
Torsten recently You are not allowed to view links. Register or Login to view. a table of words that seems to me a way to get some "feel" for what is going on. His table "lists the four most frequent 'ch/sh'-words for different sections". He describes the phenomenon as "the shift from 'chol/chor' via 'cheol/cheor', 'cheo/sheo', 'chey/shey' to 'chedy/shedy'".
I expanded on the idea, focussing on ch-words only and extracting the 30 most frequent word types in each section. I used the Zandbergen-Ladini transcription, ignoring uncertain spaces and text-only pages; I joined Astro / Cosmo / Zodiac pages into a single section. Sections are sorted from "strongly-A" to "strongly-B", as discussed by Rene at the end of You are not allowed to view links. Register or Login to view.. For each word, I include the % of occurrences in each section.
[attachment=3774]
Assuming I have not made majors errors, one can see at least four different patterns:
- the two ch-words that are most frequent in HerbalA (chol, chor) have smaller and smaller frequencies has you move towards B;
- symmetrically, there are words that are rare in A and progressively more frequent in B (cheey, chckhy);
- there are words that do not appear in A and are frequent in B (chedy, chdy); this asymmetry could be useful in choosing the direction of the mapping A->B or B->A;
- chey is somehow constant across sections.