(09-05-2026, 09:11 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view. (07-05-2026, 10:55 PM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view. (07-05-2026, 10:58 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view. (07-05-2026, 10:36 AM)Labyrinthinesecurity Wrote: You are not allowed to view links. Register or Login to view.The improved grammar (F1=0.242) outperforms Zattera's grammar (F1=0.214) when both are scored on our corpus. However, Zattera reports F1=0.270 on his own filtered corpus (~5,105 types), and our grammar was F1-trained on the test corpus while his was not.
What are the You are not allowed to view links. Register or Login to view.? I don't think F1 is informative when comparing two different grammar mechanics. Whatever "switchable templates" are they sound like they can capture some information too, leading to visibly better numbers for no actual improvement in the grammar.
Currently my best Zattera model, Zat+, beats Loop-Lay in terms of Nbits, but for a coverage of about 80% versus 100% for Loop-Lay. However due to the fat tail distribution of Voynisch words with potétial scribal errors, 100% coverage does not make much sense. I think we should settle for a much lower cocerage and optimize Nbits from that.
Which target should be set for coverage is not clear. I experimented with different targets for coverage, but with little results. I think 80% is a bit low, but this is just a personal opinion. On the other side, going above 95% probably captures more noise than data.
Can you publish here your improved garmmar? I'd be very interested to see it.
Here is the latest version (source code in attachment): the difference between my interpretation of your model (called model A below) and Zattera's model with 2 loops (model C below) is very small in terms of coverage and Nbits size.
Some important notes: we
are not comparing Zattera's slot_machine versus Loop-Lay:
- I use neither your transliteration, nor Zattera's but a slightly modified version of RF1b-e (sse CORPUS section in source code).
- Model C doesnt use Zattera's original 12 slots structure, but your 7 slots structure. It does, however, use Zattera's machine training.
A side note: removing the 'q' gallow from all chunks except the first one improves Nbits slightly. Further similar optimizations along this line may help.
========================================================================================
SIDE-BY-SIDE COMPARISON
────────────────────────────────────────────────────────────────────────────────────────
All models built from the same Manzini 7-column grid of Loop-Lay.
A = Manzini raw grid chunks (7,919), greedy DP (max_rep=5)
B = Zattera-trained 1-loop → 1,369 words as chunks → greedy DP
C = Zattera-trained 2-loop → 4,365 words as chunks → greedy DP
┌──────────────────┬────────────┬────────────┬────────────┐
│ Metric │ A: Raw DP │ B: Train×1 │ C: Train×2 │
├──────────────────┼────────────┼────────────┼────────────┤
│ Chunk vocab size │ 7,919 │ 1,369 │ 4,365 │
│ Types covered │ 8,105 │ 7,709 │ 8,062 │
│ Type coverage │ 99.8% │ 94.9% │ 99.2% │
│ Token coverage │ 100.0% │ 98.1% │ 99.8% │
│ Chunks used │ 734 │ 596 │ 4,004 │
│ Avg ch/word │ 1.892 │ 1.894 │ 1.232 │
│ Nb_dict │ 22,852 │ 18,193 │ 151,519 │
│ Nb_text │ 456,287 │ 443,031 │ 422,655 │
│ Nb_total │ 479,139 │ 461,224 │ 574,174 │
│ b/tok │ 12.97 │ 12.72 │ 15.56 │
│ vs Flat │ 0.661× │ 0.636× │ 0.792× │
└──────────────────┴────────────┴────────────┴────────────┘
Zattera's machine training stats:
┌──────────────────┬────────────┬────────────┐
│ Metric │ B: Train×1 │ C: Train×2 │
├──────────────────┼────────────┼────────────┤
│ Precision │ 0.331 │ 0.835 │
│ Recall │ 0.056 │ 0.449 │
│ F1 │ 0.095 │ 0.584 │
│ True positives │ 453 │ 3,646 │
│ False positives │ 916 │ 719 │
└──────────────────┴────────────┴────────────┘
Please let me know if you see any caveat or bug.