• Binomial distribution in VMS
  • RE: Binomial distribution in VMS

    bi3mw > 29-08-2024, 10:52 PM

    In contrast to the herbal section the substitution of the letter groups, applied to the entire VMS, improves the binomial distribution.

    Distribution without substitution:
       

    Distribution with substitution:

    ch -> C
    sh -> S
    cth -> T
    ckh -> K
    cph -> P
    cfh -> F


       
  • RE: Binomial distribution in VMS

    bi3mw > 30-08-2024, 10:36 PM

    An addition to @RobGea`s post #44:

    Total occurrences of the specified substrings: 17722
    Total number of words in the text: 39020
    Ratio of total occurrences to total words: 45.42%

    Example of words (or parts of words) containing substrings more than once:
    chckhy: 149
    chcthy: 89
    shckhy: 62
    checkhy: 52
    sheckhy: 38
    shcthy: 33
    checthy: 31
    chckhey: 30
    chockhy: 26
    chocthy: 25
    shecthy: 21
    chokchy: 21
    chotchy: 14
    chckhdy: 13
    shocthy: 13
    ............

    Substring combinations (sorted by frequency):
    You are not allowed to view links. Register or Login to view.
  • RE: Binomial distribution in VMS

    RobGea > 06-09-2024, 08:08 PM

    In this paper < Distinct word length frequencies: distributions and symbol entropies >  by Reginald Smith
    Link: You are not allowed to view links. Register or Login to view.
    Thanks to Scarecrow, ninja xref: You are not allowed to view links. Register or Login to view. 

    Smith says,
    Quote:The literature on [..] Word length frequencies typically investigate the frequency of words of different lengths in syllables. 
    These distributions are common amongst texts and are typically interpreted as a type of negative binomial distribution [..] or  [..] Hyper-Poisson { Displaced Poisson distribution }.

    Further in  "Section 2. Distinct word length distributions."
    Figure 1.  is a  "Graph of the frequency of distinct words by word length"  for various languages,

    where English shows a distribution similar to that of English in Stolfis' Figure 2. [1]
    and curiously the language graphs for ES, PT, IT, FR, DE, RU. ( to my eye at least ) show a visual similarity to a binomial distribution.
         .
    [1]  Web archive link to Stolfi page "On the VMS Word Length Distribution"    because unicamp link is currently not accessible.
    You are not allowed to view links. Register or Login to view.
  • RE: Binomial distribution in VMS

    lelle > 06-09-2024, 09:47 PM

    Can anyone please explain what might be a reasonable underlying mechanism for the observed word length distribution in the VMs? The author flipping 9 (or 10) coins simultaneously and counting numbers of heads. Numbers of heads will then dictate the length of the current word?
  • RE: Binomial distribution in VMS

    bi3mw > 07-09-2024, 02:24 AM

    (06-09-2024, 09:47 PM)lelle Wrote: You are not allowed to view links. Register or Login to view.Can anyone please explain what might be a reasonable underlying mechanism for the observed word length distribution in the VMs? The author flipping 9 (or 10) coins simultaneously and counting numbers of heads. Numbers of heads will then dictate the length of the current word?

    Coin toss as a model for the letters of a word ( as I understood it ):

    The binomial distribution only knows two "states": success or failure.
    With n = 9, a coin is therefore tossed 9 times in succession, with heads standing for success and numbers for failure.

    Each coin toss is seen as a decision as to whether another letter is added to the word.
    On each of the "n" (here 9 or 10) tosses, the coin (or the probability "p", here symmetrical 0.5) influences how long the word becomes.
    "Heads" could mean, for example, that a letter is added.
    "Number" could mean that no further letter is added.
    The binomial distribution then models the probability of how many letters are contained in a word at the end of such a process.

    Stolfi`s "Shifted by 1" means that the actual word length does not range from 0 to 9, but from 1 to 10. Each value from the binomial distribution is therefore increased by 1. A word with a length of 0 should not occur.

    In my code, as soon as the target word length has been determined, the words are shortened or lengthened accordingly.
  • RE: Binomial distribution in VMS

    Koen G > 07-09-2024, 09:06 AM

    Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?
  • RE: Binomial distribution in VMS

    bi3mw > 07-09-2024, 09:22 AM

    (07-09-2024, 09:06 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?

    Yes, you are right. After all, we don't know how this distribution came about in the VMS. We can only speculate about that. I was thinking more about the creation of binomially distributed text from normal text.
  • RE: Binomial distribution in VMS

    ReneZ > 08-09-2024, 01:53 AM

    (07-09-2024, 09:06 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Wouldn't the most likely explanation be that whatever system was used resulted in the same distribution as if coins had been tossed, rather than actually tossing 9 coins (or a coin 9 times) before writing a word?

    In modern times, yes, but in the middle ages?

    There is some thought about the binomial distribution in this paper:
    You are not allowed to view links. Register or Login to view.