The Voynich Ninja

Full Version: Lipogrammatic text
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
(18-04-2021, 05:46 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.I would be curious to see what effect my "extreme lipogram" English text of 953 words with no use of the letters A, B, and C has on such statistics as the most frequent words, conditional entropy, etc. I know it will not have the statistical structure of the Voynich ms text. But I am just wondering how different it will be from normal English, and in which ways, and how such things show up in various statistical measures. As far as I know, it may be the only example of an "extreme lipogram" text with multiple letters excluded from the same text in a lipogrammatic style.

Hi Geoffrey,
these are the 20 most frequent words in your text:
You are not allowed to view links. Register or Login to view.
You can compare them with You are not allowed to view links. Register or Login to view..

I have added to the plot an English sample (Shakespeare) of similar length (~1000 words). The main difference in your text is that it is extremely repetitive (low MATTR).
(18-04-2021, 06:14 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.
(18-04-2021, 05:57 PM)geoffreycaveney Wrote: You are not allowed to view links. Register or Login to view.Well, I have suggested earlier in this thread that the point could possibly be, for example, Yorkist English people, perhaps in northern France during the period of English control of that region in 1415-1429, communicating with each other in a cipher which also omitted entirely certain letters in the word "Lancaster", perhaps first of all omitting "L" and "A", for example.
We have two threads going on here:
  • Arguing for a lipogrammatic text that excludes words
  • Arguing for a lipogrammatic text that excludes letters
The first would read "the House of ..."
The second would read "the House of ncster". Which, I suggest, is hardly a difficult code to break.
(A real medieval encrypted text would have read "the House of DoolyDally", where DoolyDally is known by sender and receiver to be Lancaster).


What I have in mind is possibly a text that would read, for example, "the House of Irour Thee".


davidjackson Wrote:The first quarter of the 15th century is exactly when the nobles transitioned from French to English.. it is impossible to say whether anybody in any position to be worried about who was reading their diaries would be written in Anglo-French or Norman
French at that time.

I note that Edward, 2nd Duke of York, between 1406 and 1413 translated The Master of Game from French into English, and he dedicated the book to Henry, Prince of Wales (the future Henry V). So even in these high circles of the early 15th century English nobility, there was a need and a demand for translations of French texts into English.

davidjackson Wrote:Anyway, back to my basic question that I always ask - if the whole thing is written in an unknown alphabet, why bother obfuscating the plain text?

The unknown alphabet is for the purpose of encryption and secrecy. Obfuscating the plain text in the form of a lipogram would just be an added level of expressing extreme loyalty to the Yorkist cause by showing such contempt for the "Lancastrian" letters L and A that one refuses to write them at all, even in cipher.
MarcoP - would it be possible to arbitrarily add a couple of characters (representing, I dunno, two vowels that would have been removed in the lipogramatic text - we would have to calculate the normal appearance ratio of the character) to the VMS text  and re-run the test?
Just to simulate the text stats if we "returned" the missing characters.
(18-04-2021, 06:34 PM)davidjackson Wrote: You are not allowed to view links. Register or Login to view.MarcoP - would it be possible to arbitrarily add a couple of characters (representing, I dunno, two vowels that would have been removed in the lipogramatic text - we would have to calculate the normal appearance ratio of the character) to the VMS text  and re-run the test?

I don't know. When looking at Perec's text, it seems to me that Koen is right and that it is statistically undistinguishable from "normal" French. I have no idea of how it would be possible to re-introduce 'e' without going into the details of the language: I think it's something that could only be done manually by a French native speaker or maybe via software by an advanced AI.
Well, I was just thinking that if we knew the appearance profile of a particular character, and we pretended that it had been taken out; we could randomly put it back in and the stats would change to reflect its appearance, even if it wasn't put back in the correct places.
Just a thought, really.
As I said, and Marco confirmed, taking out words with one specific character won't affect your stats much, so there's not much to "restore" in the first place.

By contrast, imagine a text where you are only allowed to use words with one vowel, let's say [i]. 

EDIT: I made a childish attempt earlier, but actually it is possible to still write decently using only one vowel; see You are not allowed to view links. Register or Login to view.

You are still bound to get a lot of repeated words though.
Pages: 1 2 3 4