The Voynich Ninja

Pages: 1 2 3 4

This is a subject that has been put forward by Nick Pelling, for instance You are not allowed to view links. Register or Login to view..

NickPelling Wrote:Koen: I’ve been saying for some time that I think the next big “step up” in Voynichese study will come when some clever person finds a way to map between A patterns and B patterns, i.e. to normalize the two (errrm… actually several) parts into a single thing.

But to do this properly, you need to parse A and B, build letter contact tables for them, and then build state machine ‘grammars’ that capture how the two behave – the stuff that’s the same is probably the same, but the stuff that’s different probably involves something that was written as XXX in A being written as YYY in B. Normalizing A/B would involve being able to say “XXX == YYY”. However, this rests on the back of parsing, letter contact tables, and state machines, which (I think) steganographica tricks are disrupting. So I’m still not at all sure how we get over all the technical hurdles to get to a state where we can approach this in a rigorous enough way.

But perhaps some of these XXX == YYY equivalences can be worked out even without all that machinery. For example, I have long strongly wondered whether daiin daiin patterns in A reappear (in some way) as qotedy qokedy patterns in B. Clearly, both involve repetitive “bla-bla-bla” word sequences that are hard to reconcile with either linguistic readings or crypto theories. And given that I’ve previously speculated whether daiin daiin might be enciphering Arab numerals, it would be logical for me to speculate whether qotedy qokedy might be doing the same (but in a different way). Just a thought.

I understand that the subject is extremely complex and I doubt I can contribute much. But I think that Nick has described a promising area for further research and it could be interesting to discuss ideas and possible approaches, even if there is not much hope that we can make serious progress.

My admittedly superficial take to the problem would be to see it as some kind of optimization: find the set of N rewrite rules converting A into B (or vice-versa) so that some measure of the difference between A and B is minimized.

Even this simplistic approach poses a few questions e.g.:
* how to represent Voynchese? (as a first step, I would just experiment with a few different transliteration systems, e.g. EVA, Cuva, Currier)
* how many rewrite rules should be defined? (this is another area where one can experiment with different values for N)
* should one map A into B or vice-versa?
* is it better to compare the whole of A vs the whole of B, or to just consider the more "extreme" sections, e.g. mapping HerbalA into Bio? what to do with the intermediate Astro / Cosmo / Zodiac sections?
* how to measure the difference to be minimized? bigram histograms? word histograms? frequency of repeating word-combinations (which could address the daiin/qokedy issue mentioned by Nick)?

Torsten recently You are not allowed to view links. Register or Login to view. a table of words that seems to me a way to get some "feel" for what is going on. His table "lists the four most frequent 'ch/sh'-words for different sections". He describes the phenomenon as "the shift from 'chol/chor' via 'cheol/cheor', 'cheo/sheo', 'chey/shey' to 'chedy/shedy'".

I expanded on the idea, focussing on ch-words only and extracting the 30 most frequent word types in each section. I used the Zandbergen-Ladini transcription, ignoring uncertain spaces and text-only pages; I joined Astro / Cosmo / Zodiac pages into a single section. Sections are sorted from "strongly-A" to "strongly-B", as discussed by Rene at the end of You are not allowed to view links. Register or Login to view.. For each word, I include the % of occurrences in each section.

[attachment=3774]

Assuming I have not made majors errors, one can see at least four different patterns:

the two ch-words that are most frequent in HerbalA (chol, chor) have smaller and smaller frequencies has you move towards B;
symmetrically, there are words that are rare in A and progressively more frequent in B (cheey, chckhy);
there are words that do not appear in A and are frequent in B (chedy, chdy); this asymmetry could be useful in choosing the direction of the mapping A->B or B->A;
chey is somehow constant across sections.

Thanks Marco,

I have been wondering about these various points since the 90's. Indeed, the potential mapping of A to B or v.v. (or in fact as offspring of a common origin), and also the apparent transition chol -> cheol -> cheody -> chedy and related words.
This transition does not have to mean that words are actually being modified. It could be a shift of frequency.

Another point I had been wondering about is whether the B language could be seen as A language with additional words. The fact that B-language pages tend to have much more text than A-language pages could be just an effect of this 'adding words'.
This was suggested (probably in quite a cryptic manner) by the last bullet above 'Suggestions for further study' on You are not allowed to view links. Register or Login to view. .

My concordance deals with this to some extent, but I've never had enough time to arrange it in a chart or even to go through all the data to see where it leads.

I like the way you chose to represent it, Marco. Very helpful.

(09-12-2019, 03:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This transition does not have to mean that words are actually being modified. It could be a shift of frequency.

True. Also the shift in frequencies shows that the A to B phenomenon is not unique, it is only the most visible. There are other shifts, at various scales, that need an explanation as well. It is not as if the settings or parameters, whatever they are, only get changed once, or evolve gradually between A and B.

It is a big leap of faith to expect a straightforward mapping between A and B. Perhaps we should question the assumptions behind it, i.e. the verbose cipher hypothesis.

I agree that the 'verbose cipher' hypothesis is one of the more straightforward possible explanations why there could be a mapping between A and B. However, it is possible to study the details without, or at least before, worrying about the possible explanation.

I have considerable doubt that such a mapping can be found, but I would be happy to be taken by surprise here.

Also, I strongly support the view that such a discovery would give us great new insight into the 'system' that is behind the Voynich MS text.

An alternative to the verbose cipher would be a number theory. If the Voynich MS words are like a numbering or enumeration system, a similar progression could be expected. Just compare it with Roman numerals. D only starts appearing after 500 words and M only after 1000.

In such a case, there is no mapping, but there is a 'generating algorithm' that explains the dialects.

(09-12-2019, 05:58 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.
(09-12-2019, 03:12 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This transition does not have to mean that words are actually being modified. It could be a shift of frequency.

True. Also the shift in frequencies shows that the A to B phenomenon is not unique, it is only the most visible. There are other shifts, at various scales, that need an explanation as well. It is not as if the settings or parameters, whatever they are, only get changed once, or evolve gradually between A and B.

It is a big leap of faith to expect a straightforward mapping between A and B. Perhaps we should question the assumptions behind it, i.e. the verbose cipher hypothesis.

Hi nablator,
it seems we all agree that the phenomenon is complex and I believe that considering specific observations can be a good way to start exploring this complexity. Could you please tell us more about the "other shifts" you mentioned? Currier pointed out that the influence of the end of a word on the start of the following one is stronger in B than in A: while this could be part of the A and B phenomenon (being integral to the original description of the whole thing) it is different from a mere shift in frequencies.
But I think you have something still different in mind.

(09-12-2019, 06:52 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view....
An alternative to the verbose cipher would be a number theory. ...

...

These are pretty much my two favorite ways of thinking about it, at least for now... verbose cipher or number theory (with steganography or symbolic system being a close third).

This is not because of any personal preference (I don't care how the VMS is constructed, I just want to be able to understand it better) but because of the way the glyphs line up and the way in which certain patterns repeat.

This is a subject I find very interesting and important, though I have not studied it yet. I hope it might result in something concrete that can be understood by the general public, this would mean a lot for people's understanding of what Voynichese is like.

My preliminary thoughts are the following:

We might do this by comparing frequencies, which is the direction most people seem to be thinking. But is this enough to derive any conclusions from? Is it possible that some purely vocabulary level "conversion method" exists which has eluded us so far?

Ideally, there would be something like multi-word structures, something like syntax... which appears to be missing from the VM, unless we miss it precisely because of A/B variations?

Yes, something like "syntax" (I'm sure you meant that in a broad way) does seem to be missing. It's one of the main things I was looking for when I created my concordance of every "word" in the VMS.

There should have been certain kinds of patterns related to the drawings OR related between sections of text even if there were no drawings, and they just didn't seem to be there.

Now, there is a limitation in my concordance... I've only done one pass through the whole manuscript to actually map it (it takes not just months, but years to do it) AND, I respected the spaces. It is possible I didn't find what I expected BECAUSE the spaces are something else (in other words, not necessarily word breaks). I knew this was a possibiity, but you have to start somewhere.

So, that might account for the lack of expected linguistic patterns, but the task of doing it again (it is arduous, difficult, time-consuming work) in such a way as to re-interpret the spaces (which means a certain amount of guessing which are "real" and which ones might not be) is even bigger and I'm not sure even this would yield anything useful. Even when you take out some of the spaces, you are still left with patterns that suggest something other than words and natural grammar.

(09-12-2019, 08:39 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Hi nablator,

it seems we all agree that the phenomenon is complex and I believe that considering specific observations can be a good way to start exploring this complexity. Could you please tell us more about the "other shifts" you mentioned?

Hi MarcoP,

In mono-, bi-, trigrams statistics there are many jumps in many directions, too many to list. Overused common patterns and missing or almost missing common patterns, mostly at the page scale, or larger, but also sometimes at the paragraph scale, as RenéZ noticed. They show how versatile the system can be.

Pages: 1 2 3 4

MarcoP

ReneZ

-JKP-

nablator

ReneZ

MarcoP

-JKP-

Koen G

-JKP-

nablator