magnesium > 04-08-2025, 10:37 AM
(04-08-2025, 08:57 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(04-08-2025, 02:24 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.The Naibbe cipher isn't perfect, but it's a place to start. I'd love to collaborate with folks and further investigate whether and how the Naibbe cipher can be extended/modified to accommodate the VMS's line-level properties. Part of this work, I suspect, will involve screening for plaintext properties that make those line-level statistics more or less likely.
Thank you for sharing your work! The Naibbe cipher is a bit at odds with what I would consider a good candidate for Voynichese (for the labels to make sense, I would expect the verbosity not exceeding something like ~1.5-2.5 glyphs per plaintext character on average), but overall I think this is the most thought through attempt of replicating the statistics of Voynichese I've seen so far.
oshfdk > 04-08-2025, 10:56 AM
(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I should also note: If memory serves, most labels are uncommon word types. Within the Naibbe cipher, the overwhelming majority of the word types outside the 100 most common word types represent plaintext bigrams, with an average verbosity of ~2.5 glyphs/letter (though with some being much more verbose), consistent with the upper bound of your suggested verbosity range. The ultimate reason why the cipher encrypts unigrams as entire words is because if this is in place, it becomes much easier to achieve Voynich B's anomalously flat frequency-rank distribution of word types (see Bowern and Lindemann (2021)).
I don't know whether it's been done, but if it hasn't, it would be interesting to study the word-level statistics of the labels specifically and see how much they differ from the rest of the VMS. Any which way, the labels pose challenges for the ciphertext hypothesis: Assuming for the moment that the token and type length distributions of labels are consistent with the rest of the manuscript, well more than half of labels would have to be <5 letters long given your suggested verbosity ranges, which in many cases would still imply a weirdly short label.
(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I agree that through the lens of the Naibbe cipher, the labels in the VMS look weirdly short and uninformative (see Section 4.4 of the paper). One potential workaround is that at least some sets of labels are meant to be read as single interspersed messages. Consider, for example, the star chart on f68r2, whose 24 star labels can be theoretically read left-to-right as 8 rows of text...
(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I freely admit that this is not a complete solution.
Yavernoxia > 04-08-2025, 11:03 AM
magnesium > 04-08-2025, 11:28 AM
(04-08-2025, 10:56 AM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I should also note: If memory serves, most labels are uncommon word types. Within the Naibbe cipher, the overwhelming majority of the word types outside the 100 most common word types represent plaintext bigrams, with an average verbosity of ~2.5 glyphs/letter (though with some being much more verbose), consistent with the upper bound of your suggested verbosity range. The ultimate reason why the cipher encrypts unigrams as entire words is because if this is in place, it becomes much easier to achieve Voynich B's anomalously flat frequency-rank distribution of word types (see Bowern and Lindemann (2021)).
I don't know whether it's been done, but if it hasn't, it would be interesting to study the word-level statistics of the labels specifically and see how much they differ from the rest of the VMS. Any which way, the labels pose challenges for the ciphertext hypothesis: Assuming for the moment that the token and type length distributions of labels are consistent with the rest of the manuscript, well more than half of labels would have to be <5 letters long given your suggested verbosity ranges, which in many cases would still imply a weirdly short label.
Yes, this is true. But I said "verbosity not exceeding", so I'm not suggesting a range, more like a fuzzy upper bound. I agree that to have comfortable lengths of the labels there should be a closer to 1:1 correspondence. There is of course a chance that labels are only indices (fig. A, fig. B), referenced in the texts, in which case they can be as short as needed.
(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I agree that through the lens of the Naibbe cipher, the labels in the VMS look weirdly short and uninformative (see Section 4.4 of the paper). One potential workaround is that at least some sets of labels are meant to be read as single interspersed messages. Consider, for example, the star chart on f68r2, whose 24 star labels can be theoretically read left-to-right as 8 rows of text...
There are cases where it's not obvious how to parse the labels sequentially, for example:
(04-08-2025, 10:37 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.I freely admit that this is not a complete solution.
However, do you consider it's actually possible that some similar scheme was used for the Voynich Manuscript? If so, what would you call the strongest hints pointing in this direction?
magnesium > 04-08-2025, 11:47 AM
(04-08-2025, 11:03 AM)Yavernoxia Wrote: You are not allowed to view links. Register or Login to view.Hello @magnesium, I found the presentation really interesting and started reading the paper as soon as the meeting ended. Honestly, this is one of the best theories I've ever read about a possible encryption for natural language plaintext. As far as I know, nobody has ever been able to replicate so many properties (entropy, clustering, average length of words, positioning, differences in suffixes and prefixes, etc.) of the VMS together like this.
There are still some problems, such as the labels, and the fact that the predominance of rare glyphs in the top line is not at all explained by the Naibbe cipher. Could those lines/parts of the VMS text have been encrypted using a completely different type of cypher that still uses the same encoded glyphs? Who knows
Fact is, I think you're surely onto something with the bigram/unigram plaintext thing..
Yavernoxia > 04-08-2025, 12:12 PM
(04-08-2025, 11:47 AM)magnesium Wrote: You are not allowed to view links. Register or Login to view.As I mentioned elsewhere, one could also imagine treating the paragraph-opener gallows glyph (e.g., p) or prefix (e.g., pch) as a null, simply meant to denote the start of a paragraph.But shouldn't we then find the p glyph at the start of every paragraph? Why would p be a null that denotes the start of a paragraph, but is only used sometimes? What could be the rule for choosing when to use a null paragraph opener and when not to?
pfeaster > 04-08-2025, 01:31 PM
(03-08-2025, 10:24 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.Can this approach explain line-as-a-functional-unit properties, such as the tendency of certain characters and combinations to appear near/at the beginning or end of lines?
oshfdk > 04-08-2025, 01:55 PM
(04-08-2025, 01:31 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Not as it stands, but maybe a variant on it could. The playing-card mechanism is designed to impose a frequency ratio among choices from the different tables, and it's good at accomplishing that -- but, applied strictly as described, it would also create flat ciphertexts with none of the regional variation we know and love from our holidays on Tavie's island.
There could be arbitrary rules such as (1) when encoding the first line of a paragraph, draw from this table; (2) when encoding the first vord of a line, draw from that table; (3) when encoding the last vord of a line, draw from that other table. But that doesn't strike me as a very satisfying solution, since it doesn't offer any real explanation for such a practice.
Yavernoxia > 04-08-2025, 02:02 PM
(04-08-2025, 01:55 PM)oshfdk Wrote: You are not allowed to view links. Register or Login to view.(04-08-2025, 01:31 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.Not as it stands, but maybe a variant on it could. The playing-card mechanism is designed to impose a frequency ratio among choices from the different tables, and it's good at accomplishing that -- but, applied strictly as described, it would also create flat ciphertexts with none of the regional variation we know and love from our holidays on Tavie's island.
There could be arbitrary rules such as (1) when encoding the first line of a paragraph, draw from this table; (2) when encoding the first vord of a line, draw from that table; (3) when encoding the last vord of a line, draw from that other table. But that doesn't strike me as a very satisfying solution, since it doesn't offer any real explanation for such a practice.
I think there is a problem with adding more and more rules. But first I have to say that the below is in not way an attempt to devalue the work on the Naibbe cipher, but just my perspective.
The general methodological (?) problem I sense in the whole approach: if one sets out to replicate particular features, one would likely end up replicating these features, nothing less, nothing more. Given a simple analogy, if I get an F1 car and set myself on a mission to replicate its appearance as closely as possible, using modeling clay and scrap metal, then if I'm careful and accurate, I will end up with a very good replica, quite suitable for photo shoots, but I won't expect to learn a lot about what makes the F1 car a racetrack marvel.
It would be, for me personally, much more interesting find if the features of the Voynichese emerge due to some internal logic and simple constraints of an efficient encoding system. magnesium's cipher is a very good approximation and an excellent work at that, but at a cost of quite high verbosity and still quite complicated encoding/decoding process. Totally achievable with the tools available in the XV century, yes, but what would be the motivation to use this scheme?
So, yes, it's possible to add LAAFU rules and nulls, etc, etc and it is in the end quite possible to achieve a perfect replica of Voynichese. But as long as it is done by arbitrarily adding rules, I'm not sure one will learn much about the actual Voynichese.
Mauro > 04-08-2025, 02:21 PM