I'm not proposing a solution. Rather, I want to highlight some of the more exotic properties of natural languages that could possibly bring the text closer to VM regularity level and which I don't believe were discussed here.
I don't think these will be enough to explain features of VM text, but I believe that no "natural language plaintext substitution" theory could be plausible unless it uses at least some of them. So I want to play Devil's Advocate a little.
1) Phrasal clitics
Clitic is something in-between a bound morpheme and independent word. They were present in Proto-Indo-European and many languages descended from PIE still contain some vestiges. The position of phrasal clitics obeys Wackernagel's law: roughly speaking, it requies phrasal clitic(s) to always be in the second position of the sentence.
An example from contemporary Czech: Já
jsem si ho prohlížel. All bolded words are clitics. As you can see, they are gathered in the same position -- importantly, their order inside this position is also very strict (e.g. "li", if present, should be placed first).
This effect could explain some of line-position and paragraph-position dependence. If we assume that each line/paragraph is a complete sentence, then the occurrence of clitics close to beginning is pretty expected (I'm not sure if repeating the same clitic is allowed though; it is ungrammatical in the languages I know of).
2) Strong vowel harmony, imperfectly transcribed
Turkic and Mongolian languages has vowel harmony system. For example, in the Turkish language all vowels in a given word are either front or back (nowadays it's more complicated thanks to loanwords, but let's ignore it). Modern Latin orthography divides vowels into a/ı/o/u and e/i/ö/ü. You could describe vowel harmony in Turkish as "either every vowel is dotted, or neither is". You probably could analyze Turkish as having 4 vowels and a "frontness" suprasegmental feature instead of 8 vowels.
This could plausibly cut the number of required vowel letters by half.
3) Tone indicators
Some of Chinese languages (Hmong, Zhuang, Unified Miao) use letters both for their sound value and as tone indicators.
You are not allowed to view links.
Register or
Login to view.
Quote:The Hmong alphabet uses its letters for a job traditionally assigned to diacritics. It’s a rare case where a K isn’t a K and a second O isn’t like the first O: they’re qualities of the vowels before them.
If we again assume that lines/paragraphs correspond to sentences, and the words that require distinguishing with tone prefer specific positions in the sentence (maybe they all are verbs, for example), then non-flat letter distribution could arise as a result.
4) Non-rhyming poetry
Some forms of poetry are unrhymed (most notably, Japanese haikus) but instead employ verbal and grammatical parallelism.
5) Vowels affecting the consonants
You are not allowed to view links.
Register or
Login to view. that iotated/non-iotated vowels in Cyrillic alphabet could create a digraph system where digraph doesn't look like a ligature. She asks:
Quote:The question is whether such relationships between any related series of sounds (such as palatized and non-palatized) is realistic.
Well, aside from palatalization in Slavic languages, there also are You are not allowed to view links.
Register or
Login to view.,
Abkhaz (that You are not allowed to view links.
Register or
Login to view. as a plausible language candidate before) has three-way voiced/voiceless/ejective and palatalized/labialized/plain distinction in the consonant system and only two different vowels. John Colarusso conjectures that the ancestor of Abkhaz (Proto-Northwest Caucasian) had a normal 5 vowel system originally that collapsed into /a e i o u/ > /a ʲa ʲə ʷa ʷə/, where the vowel affected the quality of preceding consonant.