• Relationship between word frequency and word length
  • Relationship between word frequency and word length

    Addsamuels > 01-10-2023, 02:01 AM

    I think that in European languages, the most common words are short, whereas the rarer are longer on average. This is to say, that the word length is inversely proportional to the word frequency on average.


    Here are some examples:

    English:
    Code:
    you 28787591
    i 27086011
    the 22761659
    to 17099834
    a 14484562
    's 14291013
    it 13631703
    and 10572938
    that 10203742
    't 9628970
    of 8915110
    is 7400675
    in 7337058
    what 6900164
    we 6755687
    me 6444985
    this 5739788
    he 5516364
    for 5174060
    my 4938948
    on 4821861
    have 4764010
    your 4610945
    do 4419883
    was 4401531
    French
    Code:
    de 8435682
    je 8308698
    est 6942248
    pas 5833676
    le 5305591
    que 5083052
    la 4825603
    vous 4500162
    tu 4434920
    un 4360896
    c' 4184576
    à 4119959
    et 4110855
    il 4025241
    a 3679682
    l' 3675406
    ne 3343288
    les 3046663
    j' 2981242
    en 2925411
    on 2756346
    ça 2742676
    une 2717481
    d' 2603041
    ce 2533544
    German
    Code:
    ich 5890279
    sie 3806767
    das 3122198
    ist 3025610
    du 2947020
    nicht 2756783
    die 2484854
    es 2303025
    und 2289891
    der 1726001
    wir 1721620
    was 1696010
    zu 1424706
    er 1352161
    ein 1315301
    in 1231372
    ja 1054114
    mir 1019451
    mit 1014541
    wie 928920
    den 920646
    mich 893076
    auf 881418
    dass 879520
    aber 854492
    Italian
    Code:
    e 7389373
    non 6257811
    che 6063914
    di 5880995
    la 3887197
    il 3726599
    un 3555300
    a 3451723
    per 2866558
    è 2559257
    in 2156931
    una 2153925
    mi 2013071
    sono 2005178
    ho 1823350
    l' 1819882
    si 1772472
    ha 1662783
    ma 1650248
    lo 1507340
    cosa 1462858
    con 1440540
    no 1433410
    le 1425870
    ti 1405833
    Some languages have longish words which are very common, Greek's είναι (4th) and German vielleicht (104th)
    However in the Voynich the most common words is Daiin, and it seems that people say that it plays more of a verb role (this maybe because it looks like an infinitive in a Western European Language.

    Obviously the handwriting is unclear in the manuscript and thus we don't know all of the gaps, but the legnth of the word may be an important step for the decipherment.
    Regards, Alex
  • RE: Relationship between word frequency and word length

    Koen G > 01-10-2023, 02:03 PM

    The problem is that we don't know the length of the word. [Daiin] doesn't exist, it is just a way we (using the EVA system) talk about these glyphs. But arguments could be made to see [iin] or even [aiin] as a single glyph, which would make it a short word. 

    Not that this would fix Voynichese word length statistics overall. But we can never make the mistake of equating the EVA system with "Voynichese", especially not when it comes to statistics like word length. EVA is fairly analytical stroke-wise.
  • RE: Relationship between word frequency and word length

    MarcoP > 01-10-2023, 03:40 PM

    See Zipf's law of abbreviation.

    You are not allowed to view links. Register or Login to view.

    Currier B is even more puzzling, with words like qokeey and qokeedy being very common. I vaguely remember that the issue was discussed by Timm and Schinner.
  • RE: Relationship between word frequency and word length

    ReneZ > 01-10-2023, 03:53 PM

    This should be well known.... :

    You are not allowed to view links. Register or Login to view.