22-04-2026, 03:48 PM
22-04-2026, 04:05 PM
Is there a significant difference between ch initial lines and the ch lines that follow a paragraph initial gallows ie. Pch vs ch only lines. I assume there is because paragraph initial lines have p’s and F’s when most other lines don’t. But are there comparable stats between pch initial and just ch initial, etc?
22-04-2026, 04:26 PM
That's a great question! So good, in fact, that I checked it right away...
[attachment=15247]
No, - the "p" effect stays clear. pch-type tokens behave like p, not like ch.
[attachment=15247]
No, - the "p" effect stays clear. pch-type tokens behave like p, not like ch.
22-04-2026, 04:35 PM
But the 1000-dollar question is - what's the function of that initial letters? Any ideas?
Once again, for a generated hoax the system is too variable (for a 15th-century hoax) and for a random hoax it's too structured - as far as one can tell without knowing the hoax mechanism
Once again, for a generated hoax the system is too variable (for a 15th-century hoax) and for a random hoax it's too structured - as far as one can tell without knowing the hoax mechanism

22-04-2026, 06:12 PM
Quote:JoJo If we group all tokens from position 2 onwards (32,425 words), they have an average length of 4.49 and an anomaly rate of 28.2%. Even so, the first word of each line deviates significantly, with 4.91 and 41.5%.
That 41.5%, would that be for 4000 plus glyphs, words or tokens? Depending upon where you are in MS-408 that's what going on. Should you be checking for an equal distribution number for all positions then take a %. Maybe the % would go down with a greater pool. What type of a cipher are you implying?
So you are suggesting a null for the first letter? I feel the glyphs are a straight substitution, however the word translations are low for the language I found due to the nature of what the author intended.
22-04-2026, 07:28 PM
I am a bit puzzled by your figures for o_rate. In the manuscript the frequency of character o is greater than the frequency for character y. Yet the o_rate values are less than the y_ending values. It appears that o_rate is the character frequency. But qo_rate seems to be the frequency of words starting qo. To make everything consistent would it not help to have o_rate as the frequency of words with that character?
Also what are the columns t and p in your table?
Also what are the columns t and p in your table?
23-04-2026, 06:41 AM
(22-04-2026, 07:28 PM)dashstofsk Wrote: You are not allowed to view links. Register or Login to view.I am a bit puzzled by your figures for o_rate. In the manuscript the frequency of character o is greater than the frequency for character y. Yet the o_rate values are less than the y_ending values. It appears that o_rate is the character frequency. But qo_rate seems to be the frequency of words starting qo. To make everything consistent would it not help to have o_rate as the frequency of words with that character?
Also what are the columns t and p in your table?
Okay, when I classify a line by its first-token marker (say, p-start), I measure properties over all the other tokens in that line - i.e. token 2, 3, ... That's the "rest of the line" after the marker. All rates in the table refer to those rest-tokens.
o_rate = share of rest-tokens in the line that begin with o (excluding qo)
y_ending = share of rest-tokens that end with y
qo_rate = share of rest-tokens that begin with qo
The reason y_ending is higher than o_rate even though "o" is more common overall is simply that "y" is extremely concentrated at token-final position in the VMS (dy, ey, edy, y endings dominate), while o is spread across all positions within tokens.
The t and p columns are only statistic letters, not VMS Glyphs. t is the t-statistic from Welch's t-test (how strong the effect is), p is the p-value (how unlikely the result is by chance). Confusing because the first column also shows line-start markers that happen to be called t and p — sorry for the name collision.
23-04-2026, 06:53 AM
@ oeesordy
The 41.5% is the share of tokens at position 1 in each line that contain at least one internal bigram with negative PMI. So of roughly 3,800 first-position tokens about 41.5% contain such an unusual combination.
Honestly, I can't derive a specific cipher from this yet. The effect is real, but what it means for the underlying system is still open.
Not a null — a null would be content-free, but the first position clearly correlates with properties of the rest of the line (length, token structure, etc.), so it carries information of some kind.
Could you clarify your other points? I'm not sure I fully understand the questions.
The 41.5% is the share of tokens at position 1 in each line that contain at least one internal bigram with negative PMI. So of roughly 3,800 first-position tokens about 41.5% contain such an unusual combination.
Honestly, I can't derive a specific cipher from this yet. The effect is real, but what it means for the underlying system is still open.
Not a null — a null would be content-free, but the first position clearly correlates with properties of the rest of the line (length, token structure, etc.), so it carries information of some kind.
Could you clarify your other points? I'm not sure I fully understand the questions.
23-04-2026, 07:26 AM
I'm sure this is already known, but I want to note it here anyway:
The pages f49v, You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. all have sequences of single characters written next to the text (vertical columns, beside the lines). What I noticed:
All three sequences are built almost exclusively from glyphs that also function as line-start markers in the main text
You are not allowed to view links. Register or Login to view. (26 chars): f o r y e @140 k s p o @192 y e @140 @164 p o @192 y e @140 d y e k y You are not allowed to view links. Register or Login to view. col.2 (2x17): y o s sh y d o f @169 x air d sh y f f y | o d s f c @172 x t o @195 l r t o x p d You are not allowed to view links. Register or Login to view. (9 chars): s d q s o l k r s
Common core across all three: d, o, r, s. All line-start markers.
So the margin columns seem to be drawn from the same inventory as the line-start markers, plus a few page-specific special glyphs.
It seems as though the writer was trying to provide a specific clue here - so if someone is looking for a key to what the first letters mean, it’s probably right here.
( It’s interesting in this context that both You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. share some of the specific characters. Is this a direct link between the two pages? But we all know that as soon as You are not allowed to view links. Register or Login to view. comes into play, things get confusing...
)
Let’s note this: Each of these three pages has a column in which the letters are arranged from top to bottom. It contains the same characters that also appear as markers for the first position of the VMS lines. Coincidence? Since there aren’t that many glyphs in the VMS anyway? I don’t know. But, believe me, I’ll find out....
(or not)
The pages f49v, You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. all have sequences of single characters written next to the text (vertical columns, beside the lines). What I noticed:
All three sequences are built almost exclusively from glyphs that also function as line-start markers in the main text
You are not allowed to view links. Register or Login to view. (26 chars): f o r y e @140 k s p o @192 y e @140 @164 p o @192 y e @140 d y e k y You are not allowed to view links. Register or Login to view. col.2 (2x17): y o s sh y d o f @169 x air d sh y f f y | o d s f c @172 x t o @195 l r t o x p d You are not allowed to view links. Register or Login to view. (9 chars): s d q s o l k r s
Common core across all three: d, o, r, s. All line-start markers.
So the margin columns seem to be drawn from the same inventory as the line-start markers, plus a few page-specific special glyphs.
It seems as though the writer was trying to provide a specific clue here - so if someone is looking for a key to what the first letters mean, it’s probably right here.
( It’s interesting in this context that both You are not allowed to view links. Register or Login to view. and You are not allowed to view links. Register or Login to view. share some of the specific characters. Is this a direct link between the two pages? But we all know that as soon as You are not allowed to view links. Register or Login to view. comes into play, things get confusing...
)Let’s note this: Each of these three pages has a column in which the letters are arranged from top to bottom. It contains the same characters that also appear as markers for the first position of the VMS lines. Coincidence? Since there aren’t that many glyphs in the VMS anyway? I don’t know. But, believe me, I’ll find out....
(or not)23-04-2026, 08:46 AM
Unfortunately I am not convinced of your conclusion. Let me try to give some explanation for the difference in your figures for 'ch o_rate' ( 14.6%. 20.3% ).
If you take any section of the manuscript you will see that the distribution of gallows words is not uniform. In particular the distribution in quire 13 is 8 standard deviations away from what would be expected if gallows words were distributed randomly. The swing is from 35% ( You are not allowed to view links. Register or Login to view. ) to 60% ( You are not allowed to view links. Register or Login to view. ). ( Quire 13 is known to have the most consistent writing and therefore is the best section for doing analysis such as this. ) The gallows distribution for quire 20 is 10 standard deviations away. ( Together quires 13 and 20 have 52% of the total words. )
So some pages have more gallows words and others less and the deviation is statistically significant.
Moreover look at the positioning of gallows words within lines in the manuscript. It seems to be generally uniform. So therefore because of the uneven distribution across the pages it follows that if a line starts with a gallows word it will probably be followed by a greater than average frequency of gallows words. Likewise a starting no-gallow word will probably be followed by a less than average frequency.
[attachment=15265]
Now to 'ch o_rate'. Words that start ch tend to be non-gallows words. Words that start o tend to be gallows words. So this all might explains some the higher 'without' value for ch o_rate.
The same logic might explains the higher 'with' figure for your 't gallow_density'.
Uneven distribution of gallows words within the pages of the manuscript is biasing your figures.
If you take any section of the manuscript you will see that the distribution of gallows words is not uniform. In particular the distribution in quire 13 is 8 standard deviations away from what would be expected if gallows words were distributed randomly. The swing is from 35% ( You are not allowed to view links. Register or Login to view. ) to 60% ( You are not allowed to view links. Register or Login to view. ). ( Quire 13 is known to have the most consistent writing and therefore is the best section for doing analysis such as this. ) The gallows distribution for quire 20 is 10 standard deviations away. ( Together quires 13 and 20 have 52% of the total words. )
So some pages have more gallows words and others less and the deviation is statistically significant.
Moreover look at the positioning of gallows words within lines in the manuscript. It seems to be generally uniform. So therefore because of the uneven distribution across the pages it follows that if a line starts with a gallows word it will probably be followed by a greater than average frequency of gallows words. Likewise a starting no-gallow word will probably be followed by a less than average frequency.
[attachment=15265]
Now to 'ch o_rate'. Words that start ch tend to be non-gallows words. Words that start o tend to be gallows words. So this all might explains some the higher 'without' value for ch o_rate.
The same logic might explains the higher 'with' figure for your 't gallow_density'.
Uneven distribution of gallows words within the pages of the manuscript is biasing your figures.