The Voynich Ninja

Full Version: Binomial distribution in VMS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4 5 6 7
In contrast to the herbal section the substitution of the letter groups, applied to the entire VMS, improves the binomial distribution.

Distribution without substitution:
[attachment=9136]

Distribution with substitution:

ch -> C
sh -> S
cth -> T
ckh -> K
cph -> P
cfh -> F


[attachment=9137]
An addition to @RobGea`s post #44:

Total occurrences of the specified substrings: 17722
Total number of words in the text: 39020
Ratio of total occurrences to total words: 45.42%

Example of words (or parts of words) containing substrings more than once:
chckhy: 149
chcthy: 89
shckhy: 62
checkhy: 52
sheckhy: 38
shcthy: 33
checthy: 31
chckhey: 30
chockhy: 26
chocthy: 25
shecthy: 21
chokchy: 21
chotchy: 14
chckhdy: 13
shocthy: 13
............

Substring combinations (sorted by frequency):
You are not allowed to view links. Register or Login to view.
In this paper < Distinct word length frequencies: distributions and symbol entropies >  by Reginald Smith
Link: You are not allowed to view links. Register or Login to view.
Thanks to Scarecrow, ninja xref: You are not allowed to view links. Register or Login to view. 

Smith says,
Quote:The literature on [..] Word length frequencies typically investigate the frequency of words of different lengths in syllables. 
These distributions are common amongst texts and are typically interpreted as a type of negative binomial distribution [..] or  [..] Hyper-Poisson { Displaced Poisson distribution }.

Further in  "Section 2. Distinct word length distributions."
Figure 1.  is a  "Graph of the frequency of distinct words by word length"  for various languages,

where English shows a distribution similar to that of English in Stolfis' Figure 2. [1]
and curiously the language graphs for ES, PT, IT, FR, DE, RU. ( to my eye at least ) show a visual similarity to a binomial distribution.
     .
[1]  Web archive link to Stolfi page "On the VMS Word Length Distribution"    because unicamp link is currently not accessible.
You are not allowed to view links. Register or Login to view.
Can anyone please explain what might be a reasonable underlying mechanism for the observed word length distribution in the VMs? The author flipping 9 (or 10) coins simultaneously and counting numbers of heads. Numbers of heads will then dictate the length of the current word?
(06-09-2024, 09:47 PM)lelle Wrote: You are not allowed to view links. Register or Login to view.Can anyone please explain what might be a reasonable underlying mechanism for the observed word length distribution in the VMs? The author flipping 9 (or 10) coins simultaneously and counting numbers of heads. Numbers of heads will then dictate the length of the current word?

Coin toss as a model for the letters of a word ( as I understood it ):

The binomial distribution only knows two "states": success or failure.
With n = 9, a coin is therefore tossed 9 times in succession, with heads standing for success and numbers for failure.

Each coin toss is seen as a decision as to whether another letter is added to the word.
On each of the "n" (here 9 or 10) tosses, the coin (or the probability "p", here symmetrical 0.5) influences how long the word becomes.
"Heads" could mean, for example, that a letter is added.
"Number" could mean that no further letter is added.
The binomial distribution then models the probability of how many letters are contained in a word at the end of such a process.

Stolfi`s "Shifted by 1" means that the actual word length does not range from 0 to 9, but from 1 to 10. Each value from the binomial distribution is therefore increased by 1. A word with a length of 0 should not occur.

In my code, as soon as the target word length has been determined, the words are shortened or lengthened accordingly.
Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?
(07-09-2024, 09:06 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?

Yes, you are right. After all, we don't know how this distribution came about in the VMS. We can only speculate about that. I was thinking more about the creation of binomially distributed text from normal text.
(07-09-2024, 09:06 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?

In modern times, yes, but in the middle ages?

There is some thought about the binomial distribution in this paper:
You are not allowed to view links. Register or Login to view.
Pages: 1 2 3 4 5 6 7