RE: Templatic Voynich generator
petronio > 21 minutes ago
I'd like to know if anyone has tested something along these lines.
Based on this topic, I took the liberty of doing a quick cross-check, applying BPE to Voynich and measuring the average length of vocabulary morphemes (VMML). As a result, I obtained 5.918, well above the alphabetic limit in 21 corpora. The discriminative signal comes almost entirely from the suffix system: about 50 three-character endings cover 80% of the corpus.
If your models encode this concentration of suffixes explicitly, the generator output should be in the same range. If they operate through other constraints (positional, gallows, word-start rules), the VMML would likely fall back to the alphabetic group.
I would like to explore this case further.