(27-11-2024, 05:05 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Welcome to the forum!
I am not sure what you are comparing and how. Massimiliano Zattera claims an F1 score of 0.27 for his "Simply the Best" grammar "SLOT MACHINE", described by a state machine: You are not allowed to view links. Register or Login to view.
The state machine restricts considerably the possibilities that "SLOT" (slot model, F1 = 0.001) offers. You are not allowed to view links. Register or Login to view.
I guess your "grammars" are defined by a slot sequence only, without a state machine, like MZ's "SLOT".
Ok, I think I can now answer your consideration.
First thing: contrary to what I said in my previous answer, I did actually know already about "SLOT MACHINE": it's in the comparison table in Zattera's article. I did not give attention to it because of its low coverage (only 21.6% according to Zattera's data) and it got completely out of my mind (but it would have been better if I had remembered it instead!).
What Zattera did was to add on top of "SLOT" a set of rules (implemented by the state machine) which take advantage of the many regularities of the manuscript (ie.: an 'i' is (almost) always followed by 'n', 'm', 'l' , 'r', 's') to restrict the possibilities of "SLOT". This is logically equivalent to what I did with my grammars, ie. I added to BASIC-11 the rule "a final 'y' is (almost) never preceded by 'n' or 'm' " to get COMPACT-7, increasing its efficiency. I could have added more rules (like the "i rule" mentioned above) to further restrict the possibilities: I actually experimented a COMPACT-6 version where that rule was implemented, but I felt that that way of proceeding was little useful, because, obviously, by adding more and more rules one can reach any arbitrarily high efficiency. Said in another way: after coverage, efficiency is an important measure, because it's trivial to create a grammar with 100% coverage but ~0% efficiency (just use as many slots as the maximum word length and put every letter in each slot). But efficiency is not an absolute criterion to compare grammars with similar coverages, because it's also trivially easy to create a grammar with 100% coverage
and 100% efficiency (just use a different symbol for each Voynich word and put them all in the first slot).
So my opinion is that "SLOT MACHINE" is clever, but it reduces too much the coverage (to a meager 21.6%) and adds too much algorithmic complexity in order to chase a high efficiency target. And reducing coverage and adding complexity to reach an arbitrarily high efficiency can be done, in fact, with
any grammar.
I'd also like to add TL;DR comparison of my grammars vs. SLOT/SLOT MACHINEs, see my presentation for all the hard data:
By comparing grammars with similar coverage, BASIC-13 improves on SLOT both in efficiency
and coverage. Moreover BASIC-13 seems to do a better job that SLOT in capturing the structure of Voynich words, by going much more deeper in the words rank before failing to generate a word (and I think this is important).
A comparison with SLOT MACHINE is rather moot, because its coverage (the primary metric of any grammar) is too low with respect to BASIC-13 (or even SLOT). Moreover, every grammar (including mine) can be made arbitrarily efficient by adding more complications (and reducing coverage), but this becomes, beyond some threshold of complexity, a rather pointless exercise.
I have also demonstrated a grammar, EXTENDED-12, with an outstanding coverage (88-92% depending on how words with 'rare' characters' are counted) which far surpasses SLOT. It's surely possible to expand SLOT to increase its coverage, but at the moment no "SLOT-HIGHCOVERAGE" exists to make such comparison, and it's yet to be seen what the efficiency of this hypothetical grammar could be and how it would compare to EXTENDED-12 (but in regard to the only comparable grammars we have, Stolfi's, EXTENDED-12 seems to have an even better coverage and orders of magnitude better efficiency)