A possible inner structure of the VMS - vowels across the spaces?
I think I have finally found a solution to my old problem with the cores, one that could fit the 15th century.
Mandatory disclaimer: This is not yet a solution and not a reading. It is a structural hypothesis. But for me (!) it suddenly explains a lot of things in the VMS that until now seemed separate and strange.
One basic assumption stays: The visible word boundaries of the VMS are not real word boundaries.
Thesis: The visible spaces cut right through a vowel or linking layer.
A number of things follow from this.
1. Two different layers: inside and across the space
I distinguish two kinds of bigrams.
Inner bigrams: glyph pairs within a single visible VMS token.
Example:
chedy
-> (ch) e+d (y)
Cross bigrams: glyph pairs across visible spaces.
Example:
ched
y qokeedy
-> y+qo
The interesting point is: several of the oddities of the VMS fit exactly into this split. Line Initial Markers (LIM), Line End Markers (LEM), short tokens and certain long tokens behave as if there really were an inner core layer and an outer joint or linking layer.
My current definition of this structure is:
Inner bigrams = consonant / cluster layer
Cross bigrams = vowel layer
2. The actual shift
In this theory the real sound or word structure does not look like this:
Classical reading:
Token | Token | Token
but rather like this:
consonant core + vowel bigram + consonant core ...
And the visible VMS
cuts right through these
vowel bigrams (!).
A visible token in the VMS would then be roughly something like:
half a vowel code + consonant core + half a vowel code
Or shown schematically (simplified representation):
V C C V
where VV is always a vowel bigram and CC is always a consonant bigram. V is a half Vowel
But the two V's do not simply belong to the token itself. Together with the neighbouring tokens they form the actual vowel bigrams.
For two tokens in a normal VMS line:
V C C V V C C V
the real vowel lies across the visible boundary:
V CC V|V CC V
So: the vowel bigram sits between the original tokens, on top of the space, and the space cuts this vowel bigram in half.
That would be the surprising trick.
Take:
sheedy qokeedy
Roughly broken down:
sheedy = sh | e ed | y
qokeedy = qo | ke ed | y
The transition between the two tokens is:
y|qo
(Note: e, ee and the other "VMS vowels" would then of course no longer be vowels, but only glyph combinations that represent consonants.)
That would be the vowel or linking code in my theory.
So the real reading would not stop at sheedy and start again at qokeedy, but run across the visible boundary:
... ed + y|qo + ke ...
That is then not simply word end plus new word, but:
core + vowel + core
This way you suddenly get a plausible consonant / vowel structure in the VMS (simplified representation):
CC VV CC VV CC VV
Or, if you think of it as an actual sound sequence:
core - vowel - core - vowel - core
3. Why this is attractive for language
The problem with many polyphonic substitution approaches to the VMS, at least for me, was: the cores get too small, the frequencies do not match vowels and consonants, and you do not get a good language structure in terms of vowels and consonants. That is exactly what happened with my older attempts.
In this new variant something different happens.
The actual words are not separated at the visible spaces, but shifted in between the consonant cores and the vowel bigrams.
This produces a much more natural structure:
K V K
K V K
K V K
Interestingly, this short way of writing fits reduced MHG / Bavarian quite well.
Many Bavarian forms are short and basically built as cluster-vowel-cluster:
Haus
Haut
Wein
Bein
Leim
neun
Not always exactly, of course. But as a basic pattern it is strong. And since Bavarian is an almost monosyllabic language, it fits this structure well. But of course, longer words can also be encrypted this way.
Overall the VMS problem becomes less severe, because the "real" vowels are just in the wrong place: not in the visible word, but across the space.
4. What happens to the cores then?
In this model the visible token cores would not be complete words, but consonant or cluster parts.
A token then contains roughly:
the second part of a vowel bigram from the previous transition,
a final consonant or cluster part,
an initial consonant or cluster part,
the first part of the next vowel bigram.
So roughly:
V C C V
The real word or syllable boundary then does not lie at the visible space, but in the consonant region of the token.
[...] VC
| CVVC
| CVVC
This turns the picture of word boundaries completely upside down.
The visible VMS words are then not words, but wrappers:
half a vowel + consonant material + half a vowel.
This also explains why so many VMS tokens look so strangely similar. They do not have to be normal words. They can be small, recurring consonant and vowel frames.
5. A small core inventory
If this idea is right, there should be a small list of frequent inner bigrams.
And that is exactly what you see.
In my current tokenisation, where I treat ch, sh, qo etc. as units (single glyphs) and where the aiin family is at first defined as a protected special block, in Currier B roughly:
Top 20 inner bigrams: about 66 %
Top 30 inner bigrams: about 75 %
Top 50 inner bigrams: about 85 %
So one can speak of a small, reusable core inventory.
Frequent inner bigrams are for example:
e+d
e+e
k+e
ch+e
e+o
k+a
t+e
o+d
d+a
t+a
I am not yet saying:
e+d = n
That would be too early.
But structurally it looks as if these inner bigrams belong to the consonant or cluster layer. So to the parts that, in the real reading, stand before and after the vowel bigrams.
And here you can see why the two ideas belong together.
If the visible tokens in the middle are not real words, but consonant material around a shifted vowel layer, then many repetitions suddenly become less absurd. They are then not necessarily word repetitions, but recurring building blocks of a shifted writing system.
6. Short tokens: filler for special cases
A big problem, for me and for others, was always these very short 1- and 2-glyph tokens:
y
s
or
ar
ol
dy
etc., everybody knows them.
If the visible spaces were real word boundaries, these would all have to be tiny words. Some maybe. But the quantity is strange.
In my joint model they get a clear function.
Because a real language does not consist only of perfect K-V-K chains. There are words that begin with a vowel, and words that end in a vowel.
Example:
"eine andere" (English: a(n) other)
"eine" ends in a vowel.
"andere" begins with a vowel.
If the cross bigrams are the vowel or linking layer, a problem arises here: two vowel values meet (simplified representation).
VV CC
VV VV CC VV
A simple vowel transition is not enough for this (except for diphthongs, which are probably included in the cross bigrams).
This is exactly where the short tokens could help.
Many 2-atom tokens look like little joint pieces:
left vowel half + right vowel half
no consonant core
So not necessarily short words, but filler for special cases in the vowel stream.
I checked this against the neighbourhoods:
last atom of the predecessor + first atom of the 2-atom token
last atom of the 2-atom token + first atom of the successor
Many frequent 2-atom tokens have cross-typical couplings on both sides.
Examples:
ol
ar
or
al
dy
chy
qol
cthy
This fits the idea well: these tokens often carry no core, but connect two vowel or linking positions.
But not all short tokens are like this. Some are asymmetric. In such forms often only one side is cross-typical, while the other already shifts into the core layer. That is not a problem. On the contrary: it shows that these short tokens can take on different technical roles:
cross-cross
cross-core
core-cross
They are thus small, necessary switching pieces in the stream - which, if you encipher a normal language, would logically have to occur.
7. Vowel-initial words and the LIM (Line Initial Markers)
Another problem:
What happens at the start of a line? If the vowels are normally written as cross bigrams across boundaries, then at the start of the line the predecessor is missing. This is especially important for words that begin with a vowel:
in German:
anfangen (begin)
aufhoeren (stop)
essen (eat)
oben (above)
etc.
If the first plaintext word of the line begins with a vowel, the system needs an artificial left vowel half at the start of the line. This is exactly where the LIM could come in. Even more interesting: the LIM seem to have two roles.
Some form rather typical inner bigrams with the following atom:
o
qo
d
p
ch
These would be cases where the line starts directly with a consonant core. These signs would then not simply be part of the normal text, but start signs for a consonantal beginning. That is why they do not have to behave like normal beginnings.
Other LIM, in particular:
y
s
form rather cross bigrams with the following atom, with y of course being extremely productive. These would be cases where the line begins with a vowel or linking value.
And here it gets interesting: If I look only at normal text lines in the Herbal running text, that is, no label lines, then about 20.3 % of the lines begin with these possible vocalic LIM y or s.
In my MHG comparison texts, around 20-21 % of the lines begin with a vowel. This close match is of course partly coincidence. Different texts have different values. But the order of magnitude fits remarkably well.
If y / s really are the vocalic start classes, then their frequency in the Herbal text looks roughly like what you would expect from genuinely vowel-initial lines in MHG / Bavarian.
Then the LIM would not be decoration. They would be start operators.
More precisely: y and s could be start forms of the vowel or linking layer. y in particular behaves almost completely cross-typically at the start of a line. s is somewhat more mixed, but also not normally inner-typical.
This could even be a first anchor for the underlying polyphony of the vowel bigrams: at the start of a line, y and s might show the base class, while the same vowel or linking values are polyphonically disguised inside the line by other left halves.
8. Line ends and the m
If this holds at the start, it should mirror at the end.
At the start of the line the predecessor is missing.
At the end of the line the successor is missing.
If a vowel or linking code normally runs across a joint, then the line cannot simply break off. The stream has to be closed.
This is where the LEM come in, the Line End Markers.
Particularly interesting here is of course the m, or the am / om phenomenon (as a bigram).
m is, as we all know, not particularly dominant in the normal text flow, but strongly overrepresented at the end of a line. In one test, m at the end of a line was about 15 times more frequent than at normal token ends.
In this model m / am / om would not be a normal sound value, but a kind of closing operator, mostly as its own closing formula with a preceding vowel or linking part.
Xm = closing sign for the open stream at the end of the line
Then we have a nice symmetry:
LIM = start operator
LEM / m = closing operator
This would also explain the meaning, or rather the necessity, of these particular LAAFU effects.
9. The problems: AIIN remains the hard special case
The biggest open knot, for me, is still the aiin family.
My explanation, but only as an idea:
AIIN = internal ending / nasal / closing block, as opposed to the am end-closing block (the theory that am is just a differently written aiin already exists).
Maybe an -en / -n ending class. That would fit MHG / Bavarian well. But unfortunately it is not certain yet.
What I can say fairly confidently: aiin does not behave like normal core material - what a surprise.
10. Is all of this plausible for the 15th century?
The clever thing about this structure is that it has nothing to do with complicated mathematics.
It is a bigram notation with a small but highly effective layout shift.
You take vowel values, write them as bigrams across visible boundaries, and leave the consonant or cluster parts standing in the visible tokens. Already in the 15th century people knew that vowels are revealing - and that is why they were disguised.
But the other means fit the period too:
Spaces were not always set consistently, even in normal manuscripts.
In ciphers, spaces could be omitted or shifted.
Polyphonic encipherment is historically attested.
Tables of character values are historically attested.
Multi-character or bigram values would not be obviously anachronistic.
And the brilliant part, as I wrote before, would be:
The signs used immediately look Latin-medieval to a reader.
y recalls the 9-shaped abbreviation sign.
qo looks like a familiar medieval ligature or abbreviation form.
Everybody thinks: Latin, abbreviations, words.
Conclusion
But if these signs are in truth vowel halves across the spaces, then you are looking in exactly the wrong place. The eye is additionally led away from the real content.
Something that, if you look at the many proposed solutions, would have worked perfectly to this day.
If this approach is right.
Maybe this view is wrong. But once you look at the VMS through this lens, a lot of it suddenly becomes remarkably logical.
So, and now I'll put on a helmet too (someone else here in the forum wrote that after publishing his theory, I found it funny) and wait to see what happens.
Yours, Jost