04-08-2025, 10:37 AM
(04-08-2025, 08:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(04-08-2025, 02:24 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.The Naibbe cipher isn't perfect, but it's a place to start. I'd love to collaborate with folks and further investigate whether and how the Naibbe cipher can be extended/modified to accommodate the VMS's line-level properties. Part of this work, I suspect, will involve screening for plaintext properties that make those line-level statistics more or less likely.
Thank you for sharing your work! The Naibbe cipher is a bit at odds with what I would consider a good candidate for Voynichese (for the labels to make sense, I would expect the verbosity not exceeding something like ~1.5-2.5 glyphs per plaintext character on average), but overall I think this is the most thought through attempt of replicating the statistics of Voynichese I've seen so far.
Thanks! I agree that through the lens of the Naibbe cipher, the labels in the VMS look weirdly short and uninformative (see Section 4.4 of the paper). One potential workaround is that at least some sets of labels are meant to be read as single interspersed messages. Consider, for example, the star chart on f68r2, whose 24 star labels can be theoretically read left-to-right as 8 rows of text:
![[Image: rREQ3rj.png]](https://i.imgur.com/rREQ3rj.png)
I freely admit that this is not a complete solution.
I should also note: If memory serves, most labels are uncommon word types. Within the Naibbe cipher, the overwhelming majority of the word types outside the 100 most common word types represent plaintext bigrams, with an average verbosity of ~2.5 glyphs/letter (though with some being much more verbose), consistent with the upper bound of your suggested verbosity range. The ultimate reason why the cipher encrypts unigrams as entire words is because if this is in place, it becomes much easier to achieve Voynich B's anomalously flat frequency-rank distribution of word types (see Bowern and Lindemann (2021)). And to most easily reconcile the entropy of Voynichese with a natural-language plaintext, the 1.5-2.5 glyphs/letter verbosity cannot be an upper bound but should instead be considered a median value.
I don't know whether it's been done, but if it hasn't, it would be interesting to study the word-level statistics of the labels specifically and see how much they differ from the rest of the VMS. Any which way, the labels pose challenges for the ciphertext hypothesis: Assuming for the moment that the token and type length distributions of labels are consistent with the rest of the manuscript, well more than half of labels would have to be <5 letters long given your suggested verbosity ranges, which in many cases would still imply a weirdly short label.