pfeaster > 03-09-2025, 12:42 PM
(12-08-2025, 10:01 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.One of the things I want to explore is the extent to which the structure of the plaintext can create these biases within Naibbe ciphertext. For example, if the Naibbe cipher were used to encrypt a poem such as Dante's Divina Commedia, the poem's line-by-line structure would have rhyming, repeated phrases, etc. that would theoretically impose greater line-by-line positional biases in the frequencies of plaintext unigrams and bigrams relative to prose such as Pliny's Natural History. Is that sufficient to explain the full extent of the VMS's "line as a functional unit" properties? Maybe, maybe not. But maybe it becomes much easier to achieve "line as a functional unit" properties within a Naibbe-like ciphertext if the plaintext is a poem or poem-like in its structure.
nablator > 03-09-2025, 02:33 PM
(03-09-2025, 12:42 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.(1) Lines always break at word boundaries.
magnesium > 03-09-2025, 05:12 PM
(03-09-2025, 12:42 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.There's certainly no harm in exploring that. But since one of his goals is to "(b) consistently replicate these properties [ = 'well-known VMS statistical properties' ] when encrypting a wide range of plaintexts in a well-characterized natural language," I assume he'd prefer to model a system that would reliably produce LAAFU effects when applied to any source text.
Just wondering: how difficult would it be to adapt the Naibbe approach from a unigram/bigram system to a syllabic "chunk" system? Might the frequencies of different "chunk" types result naturally in something like the frequency distributions simulated through playing cards?
Jorge_Stolfi > 03-09-2025, 05:24 PM
magnesium > 03-09-2025, 06:24 PM
(03-09-2025, 05:12 PM)magnesium Wrote: You are not allowed to view links. Register or Login to view.(03-09-2025, 12:42 PM)pfeaster Wrote: You are not allowed to view links. Register or Login to view.There's certainly no harm in exploring that. But since one of his goals is to "(b) consistently replicate these properties [ = 'well-known VMS statistical properties' ] when encrypting a wide range of plaintexts in a well-characterized natural language," I assume he'd prefer to model a system that would reliably produce LAAFU effects when applied to any source text.
Just wondering: how difficult would it be to adapt the Naibbe approach from a unigram/bigram system to a syllabic "chunk" system? Might the frequencies of different "chunk" types result naturally in something like the frequency distributions simulated through playing cards?
That is absolutely my preference. My first goal was to achieve something that could replicate word-level properties reliably and then build from there. I don't think it would be difficult at all to adapt the Naibbe approach to a syllabic "chunk" system, as the lengths of the chunks as you have outlined them—1 or 2 letters—are ultimately the single most important thing within the Naibbe scheme. The VMS's word length distributions and entropy strongly suggest that to reliably map something like Latin to Voynichese, a substitution cipher would have to be verbose, such that most tokens in the VMS stand for 1 or 2 plaintext letters. Within the Naibbe cipher, those tokens are randomly formed on a letter-by-letter basis—but by no means do they have to be. A "chunk" adaptation of the Naibbe cipher would essentially equate to a deterministic, nonrandom approach to plaintext respacing.
Even within a syllabic "chunk" system, though, I'm of the opinion that the cipher would need to be homophonic, with multiple valid Voynichese word types mapping to a given plaintext chunk. Otherwise it's hard to get the VMS's observed diversity of word types and frequency-rank distribution.
magnesium > 03-09-2025, 06:37 PM
(03-09-2025, 05:24 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I can't reply to specific points now. But here are my general feelings:
I assume that the Scribe was copying from a draft on paper, provided by the real Author, and treated running text as he was used to do in Latin or whatever other language he normally used. Namely, he generally disregarded line breaks in the draft, and inserted his own breaks where needed to produce a block of left and right justified text, except for the last line.
His "algorithm" presumably was like this: when he got to within 3-4 cm of the right text rail, he looked ahead in the text and decided how many words would still fit. Then he stretched or squeezed those words so that they would end at the right rail.
This algorithm alone would produce anomalous statistics at the start and end of lines. Namely, the last word of a line would tend to be shorter than average, while the first word would tend to be longer than average. We could test it easily by applying it to any running text file.
This algorithm also can explain differences in glyph and glyph pair statistics, since these are dominated by the few most common words -- and the most common short words will have different char stats than longer ones. (If this algorithm is applied to English text, the most common short words probably have more "i" letters and "th" digraphs than long words, or words in general.
Other possible factors that could explain the anomalous statistics at start and end of lines are:
All the best, --jorge
- The Author may have told the Scribe that he could abbreviate "aiiin" as "am" if he really needs to.
- The Author may have told the Scribe to avoid breaks after or before certain words.
- The Author may have told the Scribe that he could break certain long words in certain places, say between gallows if the word has two gallows; but he should add an y or m or whatever, before or after the break, to indicate a split word.
- While the Scribe was supposed to ignore the line breaks in the draft, he may have (consciously or unconsciously) favored breaks where the draft had breaks. E.g., if, in the same situation, when the draft had "Chody ol Shedy" on the same line he would break like "Chody ol | Shedy", but if the draft happened to have "Chedy | ol Shedy" he would tend to chose the latter.
synapsomorphy > 03-09-2025, 07:27 PM
(03-09-2025, 05:24 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.This algorithm alone would produce anomalous statistics at the start and end of lines. Namely, the last word of a line would tend to be shorter than average, while the first word would tend to be longer than average. We could test it easily by applying it to any running text file.
ReneZ > 04-09-2025, 12:33 AM
RobGea > 04-09-2025, 02:06 AM
Jorge_Stolfi > 04-09-2025, 06:43 AM
(04-09-2025, 02:06 AM)RobGea Wrote: You are not allowed to view links. Register or Login to view.It may be about time for a reassessment of LAAFU. [...] For example, if my understanding is correct, the long 1st word and short last word of vms lines, according to Elmar Vogt can be also be found in 'Tomsawyer'.
You are not allowed to view links. Register or Login to view.