(13-08-2025, 11:01 PM)Antonio García Jiménez Wrote: You are not allowed to view links. Register or Login to view.If it's supposedly a language, how is it possible that the most frequently used word in the Voynich (daiin) is missing from several pages?
Statistics like word and character frequencies are not propertiesy of a
language, but of a
text. In some texts the most common word may indeed be some word that is rare in others.
In English romances and treatises, like tje Culpeper Herbal, the most common word is "the". But in the Medieval Towneley Plays, "the" is only third place, after "I" and "that". The frequency of "I" in Culpeper is only 0.003, so that there must be many pages without a single "I".
In an English novel or treatise the word "is" must be more common than "was". In a chronicle it may be the opposite.
Quote:Presumably, as in English or any other language, the most frequently used word in a text will be an article, a preposition, or a conjunction. Aren't there any of those grammatical elements in the pages cited by Torsten?
Articles are a weird feature of only some European languages. Latin, most Slavic languages (Russian, Czech, Croatian, ...) Finnish, Estonian, Latvian, and Turkish have no articles. Bulgarian and Hebrew have definite articles but they are attached to end of the word. Arabic has a definite article but it is attached in front of the word. And East Asian languages (Tibetan, Burmese, Thai, Vietnamese, Chinese...) do not have articles either.
As for prepositions, Latin and many other languages use them sparsely, and inflect the words into "cases" instead. In some (if not all) East Asian languages the words that would be translated into English prepositions are also used in other functions, like verbs or adjectives. In Turkish, Japanese, and Korean the prepositions are not separate words but suffxes attached to the end of words.
Here are the 7 most common words in some texts I happen to have:
Code:
Russian novel Roadside Picnic:
1716 и = and
961 не = not
951 в = in
743 на = on
568 он = he
461 с = with
458 что = what
Russian Pentateuch:
10036 и = and
2526 в = in
1686 на = on
1648 не = not
1477 его = his
1058 из = of
992 их = of them
Latin Pentateuch:
6818 et = and
3018 in = in
1479 ad = to
1319 est = is
1020 de = of
919 non = not
869 Dominus = Lord
Latin Ockham's Dialogus III.1.1-2:
1511 et = and
776 in = in
735 non = not
674 quod = that
621 est = is
405 ad = to
323 ut = as
Vietnamese Pentateuch:
3273 ngươi = people
3136 và = and
2936 người = people
2640 các = the
2542 cho = for
2126 đừc = men
2091 của = of
Towneley plays:
2894 I
1593 that
1492 the
1491 and
1419 to
1050 he
979 of
Culpeper herbal:
9195 the
7058 and
4791 of
3300 in
2502 to
2365 or
2160 is
Chinese novel Dream of the Red Mansion:
20799 阿 ā
15124 的 de
14590 不 bù
11745 一 yī
11192 来 lái
10979 得 de
10276 人 rén
Transcripts of Voice of America Chinese broadcasts:
2517 的 de
1664 国 guó
1325 是 shì
826 中 Zhōng
809 一 yī
777 在 zài
666 不 bù
(My Chinese texts use uncommon encodings; the Chinese characters and pinyin above were obtained via ChatGPT, so readers beware..)
All the best, --jorge