The Voynich Ninja

Pages: 1 2 3 4 5

Very true, although then it would go into the artificial language category.
A number of people have come up with this suggestion but I'm not aware of any attempts to actually put it to music - are there any?

I think there's a whole grey zone in between 'highly abbreviated text' and 'meaningful but not a language'. Steganography is one example. The difference between these and an encoded message, i.e a standard cipher, is that they are meant to facilitate communication instead of hinder it. Anything like music, coordinates, cooking instructions...

What these systems have in common is that they rely on knowledge that is expected to be present already with the audience. Knowledge about the notation system, how to play the notes, how to read the abbreviations etc.

If the text has a meaning and it resides within this zone, it will be the hardest to figure out because we simply don't have the knowledge the text expects from its readers. Perhaps the VM script has been designed to be read very quickly and easily, but we just have not been trained to do so.

(15-04-2017, 02:53 PM): DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.
(15-04-2017, 09:13 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Modelling Voynichese solely as a (large) set of prefixes and suffixes is arguably even more reductive than Gordon Rugg's table (if more empirical). But I don't honestly think { {pick a prefix} x {pick a suffix} } really counts as a valid state transition model in any useful sense of the phrase.
Because I wrote the blog pages in the order I did the analysis, only correcting things if I subsequently discovered them to be wrong (i.e. mistakes), it's easy to misinterpret what I wrote. I could have avoided that if I simply wrote a paper after doing the research, leaving out any ideas which I pursued along the way that were eventually dropped, such as that words are composed from prefixes and suffixes. But I also wanted to show the path the research took, not just the final result. Perhaps I should have advised people to read my blog backwards.

My theory is that words are composed of individual glyphs, not prefixes and suffixes. At each state, a glyph is output and then a transition is followed.

I should have been a little clearer. I have read all your webpages, and I do completely understand that you model the set of prefixes and suffixes as two separate state transition models (where the letters o and e appear in three different places and the letters y, d and l all appears twice).

However, it's very hard not to see the way that d, o, l and y appear in both halves as a tidying-up model hack to make the division between prefix model and suffix model seem simpler than it actually is in practice.

The bigger problem I have with each of your two individual state transition models is that I don't believe that the state transitions in either half are independent of the preceding context. That is, the point about practical state machines is that they aim to model not only the connectedness of the transitions but also the exit probabilities, and I really don't believe that this is the case here.

But for me, the biggest problem of all (and this is what I was actually trying to get at before) is that turning the division between prefix and suffix into something like a state transition boundary but mediated by a huge empirical table individually tweaked for each section of the text just isn't a credible explanation. It's a mechanism that can only ever "explain" after the fact (and even then with tons of special case tweaking).

If you instead want to find out something genuinely interesting about Voynichese, you would need to aim at finding the best model: and for me, that would involve determining the single-model state machine table that has the most context-independent outbound state transitions.

One way to do this would be this:
1) Form a large list of candidate groups - qo, ckh, cth, cph, cfh, ee, eee, ch, sh, ok, ot, yk, yt, dy, ii, iii, iv, iiv, ir, iir, av, aiv, aiiv, aiiiv, air, aiir, aiiir, am, ar, or, al, ol, etc - that form the potential individual nodes of the state machine.
2) Evaluate all permutations (or hillclimb, I don't honestly care) of these with a metric along the following lines: that the outbound state transitions from each node should be as independent of all preceding contexts as possible. Note that because of qo, a few of these are dependent on the order in which they are reduced into tokens (e.g. should qok reduce to qo + k or to q + ok?)

At the end of all this, (a) there should be a single model, not two or more; and (b) the outbound transition probabilities from each node in the best-fit model should be determined as strongly as possible by the context itself, not by the preceding context.

I don't believe anyone has yet attempted this: Voynichese state machine model generation to date has been an almost entirely manual process, which is almost certainly where the overall methodological flaw lies.

(16-04-2017, 01:31 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.
(15-04-2017, 02:53 PM): DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.Because I wrote the blog pages in the order I did the analysis, only correcting things if I subsequently discovered them to be wrong (i.e. mistakes), it's easy to misinterpret what I wrote. I could have avoided that if I simply wrote a paper after doing the research, leaving out any ideas which I pursued along the way that were eventually dropped, such as that words are composed from prefixes and suffixes. But I also wanted to show the path the research took, not just the final result. Perhaps I should have advised people to read my blog backwards.

My theory is that words are composed of individual glyphs, not prefixes and suffixes. At each state, a glyph is output and then a transition is followed.

I should have been a little clearer. I have read all your webpages, and I do completely understand that you model the set of prefixes and suffixes as two separate state transition models (where the letters o and e appear in three different places and the letters y, d and l all appears twice).

However, it's very hard not to see the way that d, o, l and y appear in both halves as a tidying-up model hack to make the division between prefix model and suffix model seem simpler than it actually is in practice.

The bigger problem I have with each of your two individual state transition models is that I don't believe that the state transitions in either half are independent of the preceding context. That is, the point about practical state machines is that they aim to model not only the connectedness of the transitions but also the exit probabilities, and I really don't believe that this is the case here.

But for me, the biggest problem of all (and this is what I was actually trying to get at before) is that turning the division between prefix and suffix into something like a state transition boundary but mediated by a huge empirical table individually tweaked for each section of the text just isn't a credible explanation. It's a mechanism that can only ever "explain" after the fact (and even then with tons of special case tweaking).

Despite Figure 1 and Figure 2 on You are not allowed to view links. Register or Login to view. being separate, I later settled on a single state transition table for each page cluster in the manuscript, so if there's a transition from state A to * in Figure 1 and a transition from * to state B in Figure 2, it stands for a direct transition from state A to state B. When generating a word, you start on the start state and end on the finish state in the tables from Table 2 onwards, passing seamlessly through any prefix-suffix boundary. I'm sorry if I didn't make this sufficiently clear.

There are probably better choices of states, but the aim was to find a set of states good enough to show that the manuscript can be generated from state transition tables. I think that as some glyphs appear in more than one context within words, they will need more than one state, and as words have different frequencies in different parts of the manuscript, several variants of a template state transition table will be needed.

@koen - there is of course the very real possibility that the encoding mechanism is flawed. If highly abbreviated text is constant then we may be able to crack it.
If it is in a constant state of flux with the orthography being phonic and personal - and even, God forbid, a mixture of vernaculars- then any meaning is probably lost.

(16-04-2017, 03:42 AM)DonaldFisk Wrote: You are not allowed to view links. Register or Login to view.There are probably better choices of states, but the aim was to find a set of states good enough to show that the manuscript can be generated from state transition tables. I think that as some glyphs appear in more than one context within words, they will need more than one state, and as words have different frequencies in different parts of the manuscript, several variants of a template state transition table will be needed.

I don't believe that for a minute. Your original aim was surely to build a clean abstract state machine for Voynichese, but somehow you convinced yourself along the way that interposing a gigantic state transition table in the middle was a good idea, because it let you mimic the statistics post facto.

But the main reason this was necessary was that your prefix and suffix state machine halves were wrong, not because Voynichese is meaningless.

@Nick, the candidate groups that you are proposing are not necessary wrong, but the difficulty is the contents of that grouplist.
I made a comparison of the behavior of letters & 2-grams & 3-grams and the outcome is that we can probably form groups, but they are of unequal length and form.
I have now a very nice frequency of the members of these groups. I know from attempts before that comparing them with all the freq. I have of all other languages (new & old) will be fruitless. Let's assume I will not even try that.

>>At the end of all this, (a) there should be a single model, not two or more; and (b) the outbound transition probabilities from each node in the best-fit model should be determined as strongly as possible by the context itself, not by the preceding context.

What do you suggest to do with the "model" now?

(16-04-2017, 02:07 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.@Nick, the candidate groups that you are proposing are not necessary wrong, but the difficulty is the contents of that grouplist.
I made a comparison of the behavior of letters & 2-grams & 3-grams and the outcome is that we can probably form groups, but they are of unequal length and form.
I have now a very nice frequency of the members of these groups. I know from attempts before that comparing them with all the freq. I have of all other languages (new & old) will be fruitless. Let's assume I will not even try that.

>>At the end of all this, (a) there should be a single model, not two or more; and (b) the outbound transition probabilities from each node in the best-fit model should be determined as strongly as possible by the context itself, not by the preceding context.

What do you suggest to do with the "model" now?

The point of listing so many *candidates* is that they could possibly (but not necessarily) be nodes within a larger state machine, but it would take someone writing a clever programme to assess which particular permutation of candidates to "vote up" to give the most uniform single state machine (i.e. the state machine most free of context-sensitive exit statistics).

I don't believe you have (or anyone else has) tried this because I've only just formulated in words the metric I think will work. :-)

I honestly believe that getting to a really effective state machine for Voynichese will open our eyes to what's going on there. This should particularly be true for different sections of the text, because it may well help us to see how to map A pages onto B pages in a very fine-grained way. It would also help us build an understanding of how Voynichese is genuinely built up, which would be a nice position to get to after all these years of trying. :-)

It is probable that your invention is more complex than I can imagine, perhaps I can humbly ask again, ;-) and since I am really curious:
what to do with any outcome of such a model?

(17-04-2017, 09:56 AM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.It is probable that your invention is more complex than I can imagine, perhaps I can humbly ask again, ;-) and since I am really curious:
what to do with any outcome of such a model?

It's not actually complex, it's just hard to explain (there's a big difference between the two). I'll write it up as a blog post rather than trying to jam it into comments here.

The point of such a model is to understand how Voynichese genuinely works (i.e. from the inside out), rather than just capturing lots of statistics after the event (i.e. from the outside in). Why would anyone not want to know that?

Pages: 1 2 3 4 5

davidjackson

Koen G

nickpelling

DonaldFisk

davidjackson

nickpelling

Davidsch

nickpelling

Davidsch

nickpelling