The Voynich Ninja

Full Version: Lipogrammatic text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(18-04-2021, 10:54 AM)Anton Wrote: You are not allowed to view links. Register or Login to view.Well, if undesratnd correclty, with lipogrammatic technique the author would avoid, and hence omit, all words containing a certain letter or letters. It is not that he would simply omit undesirable letters in words. For example, avoiding the letter "a" would not make "pple" of "apple", it would just lead to omission of the whole word "apple".

In principle, yes, this is how a lipogram is supposed to work. However, I can tell you from recent experience that if one attempts to write an "extreme" lipogram with multiple letters prohibited, by the point one has written several hundred words, one's commitment to the "purity" of the lipogrammatic principle breaks down, and one resorts to certain "impure" ways to simply avoid the physical appearance of the prohibited letter (or letters!) in one's text. For example, one may realize in the middle of writing a certain word that it contains the prohibited letter, and one simply "abbreviates" the word by stopping at the place where the prohibited letter would appear. Then at a certain point one descends to the next level of impurity and simply omits the prohibited letter and continues writing the rest of the word. At this point one is simply desperate to find some way, any way, to express what one wants to say without using the prohibited letter or letters.

A few days ago, I tested the concept of the "lipogrammatic English text without the letters ABC" idea by attempting to write such a text myself. I have attached a file of this text to this message, for those who are curious to see what such a text might look like. I managed to get as far as 953 words. It was extremely exhausting to write, and I expect it will probably be rather painful to read as well. No, I do not think it exhibits the statistical properties of the Voynich ms text. But if one had to write such an extreme lipogrammatic text for not just 1,000 words but for as many as 38,000 words, I can imagine that the quality and perhaps the entropy of the text would continue to deteriorate further and further. 

One difference between my text and the Voynich ms text: I found myself repeating the same "stock phrases" over and over again, simply in order to express a certain thought without using words with the prohibited letters. But I understand that the Voynich ms text lacks the frequency of repetition of multi-word phrases that one expects to find at least in normal text of European natural languages. (It is different for a language like Inuktitut, but its word structure is completely different, nothing like the Voynich ms text either.) So this is one particular feature of the Voynich ms text that still needs to be explained, because in my experience the lipogrammatic requirement led to more repetition of the same multi-word phrases than one finds in normal text, not to less frequent appearance of multi-word phrases as we find in the Voynich ms text.
I don't think a lipogrammatic text which redacts entire words makes any sense.
I would suggest that the medieval mindset would have led to a substitution of the prohibited word, rather than an attempt to remove it altogether - what would be the point in a long text?
It would end up like those old narrative fiction novels that blank out names... as in The Earl of - who led the Regiment of - went to Canterbury in 17-....
You end up saying "what? Who? When?" to yourself and have no idea what's going on. (They did it, it seems, to prevent any mistakes creeping in which might lead to a lawsuit from a real Earl for having been defamed, or to avoid the attention of the political censors).
So instead, in a personal diary, you'd write the Earl of xyz, and know that, actually, you're talking about Ronald McDonald or whoever. Which is the way medieval ciphers worked - they tended to just encipher nouns or pre-arranged phrases

(18-04-2021, 02:00 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.It was extremely exhausting to write, and I expect it will probably be rather painful to read as well.
.You should have just written it normally, then used search & replace Big Grin
(18-04-2021, 04:28 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.what would be the point in a long text

I can imagine reasons of magic or political-symbolic origin, e.g. Guelphs vs Ghibellines (both would have to exclude the letter G, anyway). As soon as there are examples, like the said Petrus, this is an interesting possibility which should not be excluded.

The foremost question is whether letters would be excluded from the plain text or from the cipher text. The latter is easily checked - one just needs to check whether there are any folios or paragraphs missing any of the Voynichese glyphs (rare stuff like x excluded, of course), and if so, if there is any system in arrangement of such folios or paragraphs. From the plots in the VQP I can quickly see that there hardly is a paragraph without a, o or i, and even q is quite, quite widespread.
(18-04-2021, 12:34 PM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Certainly, especially with a frequent vowel like [e], English would take a serious hit. What I mean is that if we'd look at the result as a new language (so not specifically spot the differences with normal English), any stat you can think of would be well within the range of normal linguistic behavior.

Hi Koen,
I agree. The text in the actual examples is perfectly grammatical and of course it behaves like language. The lipogram exercise is only interesting if you produce a correct text; anything else would be trivial and doesn't even qualify as a lipogram.

A kind of statistics that is obviously affected is the list of the most frequent words. I compared You are not allowed to view links. Register or Login to view. with an extract by Dumas of a similar length. Four of the 10 top words by Dumas are in the top 20 by Perec. The exceptions are:
  • 5 words that contain 'e'
  • 'vous'
The difference between Currier A and B is much more dramatic (only daiin is consistenlty present among the high-ranking words).
[attachment=5462]

Here I have plotted conditional entropy vs MATTR200 for the files in Brian Cham's corpus. I also included Currier A and B from the VMS (ZL transliteration) and the Perec and Dumas files. This shows that lipograms have no significant effect on these two measures.

[attachment=5463]
(18-04-2021, 04:28 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.I don't think a lipogrammatic text which redacts entire words makes any sense.
I would suggest that the medieval mindset would have led to a substitution of the prohibited word, rather than an attempt to remove it altogether - what would be the point in a long text?
It would end up like those old narrative fiction novels that blank out names... as in The Earl of - who led the Regiment of - went to Canterbury in 17-....
You end up saying "what? Who? When?" to yourself and have no idea what's going on. (They did it, it seems, to prevent any mistakes creeping in which might lead to a lawsuit from a real Earl for having been defamed, or to avoid the attention of the political censors).
So instead, in a personal diary, you'd write the Earl of xyz, and know that, actually, you're talking about Ronald McDonald or whoever. Which is the way medieval ciphers worked - they tended to just encipher nouns or pre-arranged phrases

(18-04-2021, 02:00 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.It was extremely exhausting to write, and I expect it will probably be rather painful to read as well.
.You should have just written it normally, then used search & replace Big Grin

Composing a true lipogrammatic text in its purest form is rather more complicated than simply redacting, removing, or even substituting for any words that contain the prohibited letter or letters. For certain names and nouns, yes, you can just use a code name or a synonym or a near synonym. For some other words, yes, you can also try to find a synonym or a near synonym as a substitute word or phrase or circumlocution. But I can now tell you from recent personal experience, it is not always so easy to do this! And for certain grammatical structures, a simple substitution method can be well nigh impossible or nearly so.

For example, yes, you could just decide to always replace "and" with the word "with". I wish I had thought of that before I began writing my lipogrammatic text Big Grin  It is not exactly precisely the same thing, but yes I accept that it is close enough for most purposes in a lipogrammatic text. However, other grammatical and function words are harder to handle in such a simple manner. What would you do with the word "a" or "an" in such a text? If you just leave them out, that really changes the grammatical structure of English. I do recall that in some places I tried to simply use "some" in place of "a". I felt that was slightly "less bad English" than using "one", for example. But that was a stylistic judgment, so to speak, in each instance, not a blanket "search & replace" type of decision. 

And how would you search & replace for "are", "was", "has", "have", "had", "am", "be", "been", "can", "able", any word ending in "-able"/"-ible", etc.? English does not simply have ready-made substitute synonyms for each and every such grammatical or function word. One has to think through carefully how to express the same or a similar thought or idea using a different phrasing that does not use any of these words that contain a prohibited letter or letters. For example, maybe in some cases you can replace "have" with "possess". But then when you apply this "search & replace" formula to the phrase "I have gone", the result becomes "I possess gone"! Is that a "word salad", a search & replace lipogram gone wrong, or some kind of actual cipher? It may not always be so easy to tell the difference between these possibilities.

Here is another example from my own personal extreme lipogram-writing experience: In some instances it may be simple to replace the word "say" with "express", for example. But I felt it did not work in the case of the phrase "I should say". This is an idiomatic phrase with a very particular meaning and particular contexts in which it is used. "I should express" simply makes no sense as a substitute phrase in many instances. I finally settled on the long circumlocution "is how I should express it" as a replacement phrase, in the sentence, "One more time I see I must delete it, well, edit it, I should say," which thus then became "One more time I see I must delete it, well, edit it, is how I should express it." This is why I say (ahem, I express) it was extremely exhausting to write.
(18-04-2021, 05:25 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.I agree. The text in the actual examples is perfectly grammatical and of course it behaves like language. The lipogram exercise is only interesting if you produce a correct text; anything else would be trivial and doesn't even qualify as a lipogram.

A kind of statistics that is obviously affected is the list of the most frequent words. I compared You are not allowed to view links. Register or Login to view. with an extract by Dumas of a similar length. Four of the 10 top words by Dumas are in the top 20 by Perec. The exceptions are:
  • 5 words that contain 'e'
  • 'vous'
The difference between Currier A and B is much more dramatic (only daiin is consistenlty present among the high-ranking words).

Here I have plotted conditional entropy vs MATTR200 for the files in Brian Cham's corpus. I also included Currier A and B from the VMS (ZL transliteration) and the Perec and Dumas files. This shows that lipograms have no significant effect on these two measures.

Marco, thank you for this analysis. I would be curious to see what effect my "extreme lipogram" English text of 953 words with no use of the letters A, B, and C has on such statistics as the most frequent words, conditional entropy, etc. I know it will not have the statistical structure of the Voynich ms text. But I am just wondering how different it will be from normal English, and in which ways, and how such things show up in various statistical measures. As far as I know, it may be the only example of an "extreme lipogram" text with multiple letters excluded from the same text in a lipogrammatic style.
OK, I admit the "search and replace" suggestion was a tongue in cheek joke Big Grin 
To do a proper lipogrammatical text is exhausting, but that is the whole point - it's an intellectual exercise. That French bloke who wrote that novel without using the letter "e". Poncela, who IIRC wrote five novels and in each one omitted one sequential vowel (bet the "u" was the easiest one to write!)
But I don't see why our scribe(s) would have used this literary form in his (their) book. The whole point, really, is to show off to the world (unless there is a religious reason, I suppose, but I don't know of any).
(18-04-2021, 05:48 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.OK, I admit the "search and replace" suggestion was a tongue in cheek joke Big Grin 
To do a proper lipogrammatical text is exhausting, but that is the whole point - it's an intellectual exercise. That French bloke who wrote that novel without using the letter "e". Poncela, who IIRC wrote five novels and in each one omitted one sequential vowel (bet the "u" was the easiest one to write!)
But I don't see why our scribe(s) would have used this literary form in his (their) book. The whole point, really, is to show off to the world (unless there is a religious reason, I suppose, but I don't know of any).

Well, I have suggested earlier in this thread that the point could possibly be, for example, Yorkist English people, perhaps in northern France during the period of English control of that region in 1415-1429, communicating with each other in a cipher which also omitted entirely certain letters in the word "Lancaster", perhaps first of all omitting "L" and "A", for example.
One more thought about lipograms and the Voynich ms text: Given the difficulties of writing lipograms and finding the right words to express oneself without using any words containing the prohibited letter or letters, a person or group of people who were serious about writing this way, for whatever reason -- I have suggested the example of a group of Yorkists communicating in a cipher that excludes "L" and "A" entirely -- might compile word lists of permissible words to aid them in composing their actual lipogrammatic texts.

In this case, certain pages or sections of the ms might simply be more or less random collections of such word lists, for them to use as an aid in composing actual texts in the lipogrammatic cipher. The compilation of such word lists might also serve as a form of handwriting practice for those writing texts in the cipher. 

Then other pages or sections of the ms may contain the actual meaningful texts composed in this lipogrammatic cipher.
(18-04-2021, 05:57 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.Well, I have suggested earlier in this thread that the point could possibly be, for example, Yorkist English people, perhaps in northern France during the period of English control of that region in 1415-1429, communicating with each other in a cipher which also omitted entirely certain letters in the word "Lancaster", perhaps first of all omitting "L" and "A", for example.
We have two threads going on here:
  • Arguing for a lipogrammatic text that excludes words
  • Arguing for a lipogrammatic text that excludes letters
The first would read "the House of ..."
The second would read "the House of ncster". Which, I suggest, is hardly a difficult code to break.
(A real medieval encrypted text would have read "the House of DoolyDally", where DoolyDally is known by sender and receiver to be Lancaster).

The first quarter of the 15th century is exactly when the nobles transitioned from French to English.. it is impossible to say whether anybody in any position to be worried about who was reading their diaries would be written in Anglo-French or Norman
French at that time.

Anyway, back to my basic question that I always ask - if the whole thing is written in an unknown alphabet, why bother obfuscating the plain text?
Pages: 1 2 3 4