The Voynich Ninja

Full Version: Relationship between word frequency and word length
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I think that in European languages, the most common words are short, whereas the rarer are longer on average. This is to say, that the word length is inversely proportional to the word frequency on average.


Here are some examples:

English:
Code:
you 28787591
i 27086011
the 22761659
to 17099834
a 14484562
's 14291013
it 13631703
and 10572938
that 10203742
't 9628970
of 8915110
is 7400675
in 7337058
what 6900164
we 6755687
me 6444985
this 5739788
he 5516364
for 5174060
my 4938948
on 4821861
have 4764010
your 4610945
do 4419883
was 4401531
French
Code:
de 8435682
je 8308698
est 6942248
pas 5833676
le 5305591
que 5083052
la 4825603
vous 4500162
tu 4434920
un 4360896
c' 4184576
à 4119959
et 4110855
il 4025241
a 3679682
l' 3675406
ne 3343288
les 3046663
j' 2981242
en 2925411
on 2756346
ça 2742676
une 2717481
d' 2603041
ce 2533544
German
Code:
ich 5890279
sie 3806767
das 3122198
ist 3025610
du 2947020
nicht 2756783
die 2484854
es 2303025
und 2289891
der 1726001
wir 1721620
was 1696010
zu 1424706
er 1352161
ein 1315301
in 1231372
ja 1054114
mir 1019451
mit 1014541
wie 928920
den 920646
mich 893076
auf 881418
dass 879520
aber 854492
Italian
Code:
e 7389373
non 6257811
che 6063914
di 5880995
la 3887197
il 3726599
un 3555300
a 3451723
per 2866558
è 2559257
in 2156931
una 2153925
mi 2013071
sono 2005178
ho 1823350
l' 1819882
si 1772472
ha 1662783
ma 1650248
lo 1507340
cosa 1462858
con 1440540
no 1433410
le 1425870
ti 1405833
Some languages have longish words which are very common, Greek's είναι (4th) and German vielleicht (104th)
However in the Voynich the most common words is Daiin, and it seems that people say that it plays more of a verb role (this maybe because it looks like an infinitive in a Western European Language.

Obviously the handwriting is unclear in the manuscript and thus we don't know all of the gaps, but the legnth of the word may be an important step for the decipherment.
Regards, Alex
The problem is that we don't know the length of the word. [Daiin] doesn't exist, it is just a way we (using the EVA system) talk about these glyphs. But arguments could be made to see [iin] or even [aiin] as a single glyph, which would make it a short word. 

Not that this would fix Voynichese word length statistics overall. But we can never make the mistake of equating the EVA system with "Voynichese", especially not when it comes to statistics like word length. EVA is fairly analytical stroke-wise.
See Zipf's law of abbreviation.

You are not allowed to view links. Register or Login to view.

Currier B is even more puzzling, with words like qokeey and qokeedy being very common. I vaguely remember that the issue was discussed by Timm and Schinner.
This should be well known.... :

You are not allowed to view links. Register or Login to view.