![]() |
|
[Article] update to Zattera's slot machine - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: News (https://www.voynich.ninja/forum-25.html) +--- Thread: [Article] update to Zattera's slot machine (/thread-5715.html) Pages:
1
2
|
RE: update to Zattera's slot machine - Labyrinthinesecurity - 13-05-2026 (11-05-2026, 06:12 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I'm not much into Python, so I cannot check the code, I'm sorry. All 3 models share the same structure: Slot1 = ['q', 'ch', 'sh', 'cth', 'ckh', 'cph', 'cfh', ''] Slot2 = ['e', ''] Slot3 = ['e', ''] Slot4 = ['o', 'a', ''] Slot5 = ['i', 'ii', 'iii', 'iiii', ''] Slot6 = ['l', 'r', 'd', 'n', 'm', 's', 't', 'k', 'p', 'f', ''] Slot7 = ['y', ''] Here are the main differences:
RE: update to Zattera's slot machine - Jorge_Stolfi - 13-05-2026 (13-05-2026, 03:36 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.All 3 models share the same structure: My old "crust-mantle-core" model, discussed starting at You are not allowed to view links. Register or Login to view., seems to be more demanding than that one. Except perhaps for the placement of [aoy]. Here is my current preferred version. It s described as a sequence of four parsing steps/levels for convenience, but can be encoded as a single loop-free finite automaton, although some of the "total count" constraints in the last level are more compactly described by an algorithm. CLEANUP pass: The character m is considered an abbreviation for in. The combination ir is assumed to be a scribal error for iin. The characters b g u are malformed versions of other characters, possibly n m an. The glyphs Ih ITh IKh etc are malformed versions of Ch CTh CKh etc. A doubled hh is a malformed version of he. The glyphs Cs and sh are just variant forms of Sh. The abbreviations should be expanded, and the erroneus characters and combinations should be mapped to the most likely correct ones before applying the following levels of parsing. ELEM level: A cleaned word of the VMS (in the EVA encoding) passes this level if it can be parsed into a string of elements drawn from the following sets:
Thus, for instance ockhechdy (oCKheChdy) is parsed as {o}{ckhe}{ch}{d}{y} qokaiin is parsed as {q}{o}{k}{iin} chedy is parsed as {che}{d}{y} cheedy is parsed as {ch}{ee}{d}{y} cheeedy is parsed as {che}{ee}{d}{y} cheeeedy is parsed as {ch}{ee}{ee}{d}{y} Note that the parsing is ambiguous if a words has three or more e in a row. So cheeedy could also be {ch}{eee}{d}{y}, and cheeeeedy could be parsed also as {che}{eee}{dy}. The choices above are arbitrary, and have limited implications. OKOKO level: Let K be the set of all elements that are not in the set O, namely K = Q ∪ D ∪ X ∪ G ∪ H U N. A cleaned word that passed the ELEM level also passes this level if it consists of zero or more K elements with at most two O elements inserted before the first K, between every two consecutive Ks, and after the last K. Thus, for example, {o}{y} passes this level, {o}{a}{ch}{sh}{o}{r}{o}{a}{d}{o}{y} passes (with pattern OOKKOKOOKOO), whereas {ch}{o}{a}{y}{d}{y} does not (three Os in a row). CMC level: A cleaned word that passes the ELEM and OKOKO levels will pass the crust-mantle-core (CMC) level if, after deleting all the O elements, it has the form Q^q D^d X^x G^g H^h X^y D^e N^n where
Without these sum constraints, the three parsing levels can be realized as compact finite automaton that can be drawn on a single page. With the sum constraints, the automaton is about 3x bigger because each state must be unfolded into three states in order to record the three counts in the part already parsed. The numbers vary depending on the section and transcription version used, but it seems that, after the CLEAN step, about 95% of the tokens pass the other three levels. There probably are further rules relating the insertion of the "O"s in the CMC pattern. For instance, maybe we can require that a "Q" is always followed by at least one "O", and an "N" is almost always preceded by at least one "O". Said another way, maybe we can combine the OKOKO and CMC models in a single formula with rules that tie the number of "O"s in each slot to the presence or number of the other "Q", "D" etc elements. This and other refinements remain to be explored. But the main question is how this word model compare to those you are using. All the best, --stolfi RE: update to Zattera's slot machine - Labyrinthinesecurity - 13-05-2026 Jorge_Stolfi dateline='[url=tel:1778705882' Wrote: You are not allowed to view links. Register or Login to view.1778705882[/url]'] that's VERY interesting indeed, and well worth exploring when I will have time. thanks for sharing! RE: update to Zattera's slot machine - Jorge_Stolfi - 13-05-2026 OOP. I wrote: (13-05-2026, 09:58 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Note that e and i are valid elements per se. I meant "e and i are not valid elements per se". Sorry. All the best, --stolfi RE: update to Zattera's slot machine - ReneZ - 14-05-2026 (13-05-2026, 03:36 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.All 3 models share the same structure: This would not be able to generate very common words like "qokeey" or "qokedy". Or am I misunderstanding something? RE: update to Zattera's slot machine - Jorge_Stolfi - 14-05-2026 (13-05-2026, 09:58 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.Without these sum constraints, the three parsing levels can be realized as compact finite automaton that can be drawn on a single page. With the sum constraints, the automaton is about 3x bigger because each state must be unfolded into three states in order to record the three counts in the part already parsed. Oops. Actually it is more than 3x, because each state must keep track of the x+y count (0,1,2) and the q+d+e count (01,2); so each state near the end of the word may become 9 states. (The g+h count is checked immediately so it does not require duplicating states) All the best, --stolfi RE: update to Zattera's slot machine - Mauro - 14-05-2026 (13-05-2026, 03:36 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.(11-05-2026, 06:12 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I'm not much into Python, so I cannot check the code, I'm sorry. I'm a little confused. Model A: I'm sorry but I don't understand what you mean by "all slot combinations treated as chunks" Model B: why do you call it 'Zattera'? It's completely different from Zattera's. My tests: 1 repetition along the slot grammar Quote:Transliteration file used: RF1a-n-x7 Full cleaned It has an extremely low coverage (ie. it cannot find 'daiin'). With 2 repetitions (Model C) Quote:Transliteration file used: RF1a-n-x7 Full cleaned Coverage stays low, ~46% So I etsted with 5 repetitions: Quote:Transliteration file used: RF1a-n-x7 Full cleaned Coverage gets quite good: ~100%. Nbits_tokens is 488975, and Nbits_tokens/Coverage is 490014, quite good values. Loop-Lay was slightly better, at Nbits = 486946 and Nbits/Coverage = 487859. The best grammar I ever found (unpublished in detail, just posted in a thread) improves a little more: Quote:Transliteration file used: RF1a-n-x7 Full cleaned RE: update to Zattera's slot machine - oshfdk - 14-05-2026 (14-05-2026, 09:19 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Coverage gets quite good: ~100%. Nbits_tokens is 488975, and Nbits_tokens/Coverage is 490014, quite good values. Loop-Lay was slightly better, at Nbits = 486946 and Nbits/Coverage = 487859. The best grammar I ever found (unpublished in detail, just posted in a thread) improves a little more: Does the number of allowed loops affect Nbits in any way? Assuming the same slot assignments achieve 95% coverage for 4 loops and 100% coverage for 5 loops, is it possible for the Nbits score to be worse for the second grammar? RE: update to Zattera's slot machine - Mauro - 15-05-2026 (14-05-2026, 11:32 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(14-05-2026, 09:19 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.Coverage gets quite good: ~100%. Nbits_tokens is 488975, and Nbits_tokens/Coverage is 490014, quite good values. Loop-Lay was slightly better, at Nbits = 486946 and Nbits/Coverage = 487859. The best grammar I ever found (unpublished in detail, just posted in a thread) improves a little more: The lower the coverage the lower Nbits will be, because less tokens are encoded. That's why now I don't use the raw Nbits as a metric, rather Nbits/Coverage. Ie. if you look at one of the tables of my post above (2 repetitions Model C): Quote:Grammar name: PROVA Voynich Ninja Labyrinthinesecurity, max repeats = 2 This grammar has a low coverage, ~47%, so it encodes only a part of the text and, indeed, it has a very low Nbits (328908). But Nbits/Coverage is 735103, much higher i.e. than the 481947 of my 'best' grammar. Or, in other words: the original Nbits metric is okay when comparing grammars with a similar coverage (as I did in my article, Nbits/Coverage was just in a footnote), but one needs Nbits/Coverage (or some analogous formula) in the more general case, where the coverage changes. And, in effect, better always use Nbits/Coverage. Sorry for the confusion! |