Probability distribution of EVA letters within a random VMS word
There is a really nice blog post here:
Letter Distributions in the English Language and Their Relations -- Tim Hargreaves
You are not allowed to view links.
Register or
Login to view.
I didn't do that blog post justice and full description of this method can be found there, but i ended up with this
it's a nice little visualization of where the EVA-letters are likely to be found in a vord.
If we select a random word from the ZL-3a transcription of the vms,
then for the EVA-lettters in that word, these graphs show where in the word that letter is likely to occur.
Preparation:
Transciption file ZL3a-n.txt
ivtt.exe -x7 -a2 -@L ZL3a-n.txt ZL2023_Clean.txt
removed any words with:
apostrophes
? marks
rare chars
single letter words
added the word 'vw' at the end -necessary for R-code to run
.
Description:
Glyphs have been ordered according to the similarity of their probability distributions.
X-axis : Position of a letter within a word:: Leftmost -> beginning of word, Rightmost -> end of word
Y-axis : Probability:: bottom is 0 (never occurs) -> top is 1 (certain to occur)
Labels : EVA-letter (black), VMS Glyph (gray)
Low frequency glyphs not shown
Underlying grey plots are the exact plots
Colored overlays are the Loess smoothed data ( to reduce noise )
Explanation:
We can see with the group < p, q, c, s > that the plot starts near the top left of the graph and descends quickly as it moves to the right.
Showing these glyphs have a high probability of appearing at the start of a word and have a low Probability at the end of a word.
The roughly opposite effect is observed in the groups < y, r, m, g > and < l > and < n >.
Their plot starts at the bottom left, indicating a low probability of these letters occurring at the beginning of the word.
The plots stay low, denoting their continued low chance of being found as we proceeed further into the word.
Then their plots rise steeply showing that these letters are more likely to be found at the end of a word.
Letters < e > and < i > have an single peak in the middle of the X-axis indicating they are most likely to be found in the middle of a word.
=====================================
A generalized grouping can be described like so:
Code:
EVA-letters Most Probable Position
P Q C S mostly word-beginners
A O F D first-two-thirds of a word
K T H mostly mid-word
E I mid-word
Y R M G word-enders
L mostly word-enders
N word-enders
For a detailed study on glyph position see < S.Palmer, Voynich MS glyph position stacks >
You are not allowed to view links.
Register or
Login to view.
N.B
3 Statistical artefacts are noted:
< EVA-q > the loess smooothing goes awry here, it should follow the black line better.
< EVA-o > the graph shows a small second peak at about 2/3 of the X-axis, this peak is not in the data source.
< EVA-l >, < EVA-n > are colored differently because of the statistic used to generate the colored groups.
.