(15-04-2017, 02:53 PM): DonaldFisk Wrote: You are not allowed to view links. Register or Login to view. (15-04-2017, 09:13 AM)nickpelling Wrote: You are not allowed to view links. Register or Login to view.Modelling Voynichese solely as a (large) set of prefixes and suffixes is arguably even more reductive than Gordon Rugg's table (if more empirical). But I don't honestly think { {pick a prefix} x {pick a suffix} } really counts as a valid state transition model in any useful sense of the phrase.
Because I wrote the blog pages in the order I did the analysis, only correcting things if I subsequently discovered them to be wrong (i.e. mistakes), it's easy to misinterpret what I wrote. I could have avoided that if I simply wrote a paper after doing the research, leaving out any ideas which I pursued along the way that were eventually dropped, such as that words are composed from prefixes and suffixes. But I also wanted to show the path the research took, not just the final result. Perhaps I should have advised people to read my blog backwards.
My theory is that words are composed of individual glyphs, not prefixes and suffixes. At each state, a glyph is output and then a transition is followed.
I should have been a little clearer. I have read all your webpages, and I do completely understand that you model the set of prefixes and suffixes as two separate state transition models (where the letters o and e appear in three different places and the letters y, d and l all appears twice).
However, it's very hard not to see the way that d, o, l and y appear in both halves as a tidying-up model hack to make the division between prefix model and suffix model seem simpler than it actually is in practice.
The bigger problem I have with each of your two individual state transition models is that I don't believe that the state transitions in either half are independent of the preceding context. That is, the point about practical state machines is that they aim to model not only the connectedness of the transitions but also the exit probabilities, and I really don't believe that this is the case here.
But for me, the biggest problem of all (and this is what I was actually trying to get at before) is that turning the division between prefix and suffix into something like a state transition boundary but mediated by a huge empirical table individually tweaked for each section of the text just isn't a credible explanation. It's a mechanism that can only ever "explain" after the fact (and even then with tons of special case tweaking).
If you instead want to find out something genuinely interesting about Voynichese, you would need to aim at finding the best model: and for me, that would involve determining the single-model state machine table that has the most context-independent outbound state transitions.
One way to do this would be this:
1) Form a large list of candidate groups - qo, ckh, cth, cph, cfh, ee, eee, ch, sh, ok, ot, yk, yt, dy, ii, iii, iv, iiv, ir, iir, av, aiv, aiiv, aiiiv, air, aiir, aiiir, am, ar, or, al, ol, etc - that form the potential individual nodes of the state machine.
2) Evaluate all permutations (or hillclimb, I don't honestly care) of these with a metric along the following lines: that the outbound state transitions from each node should be as independent of all preceding contexts as possible. Note that because of qo, a few of these are dependent on the order in which they are reduced into tokens (e.g. should qok reduce to qo + k or to q + ok?)
At the end of all this, (a) there should be a single model, not two or more; and (b) the outbound transition probabilities from each node in the best-fit model should be determined as strongly as possible by the context itself, not by the preceding context.
I don't believe anyone has yet attempted this: Voynichese state machine model generation to date has been an almost entirely manual process, which is almost certainly where the overall methodological flaw lies.