(23-04-2026, 03:57 PM)eggyk Wrote: You are not allowed to view links. Register or Login to view.Of course there are statistics for a language, and those statistics can be useful for comparing languages.
That is not true, and can be a disastrous assumption to make when comparing languages.
What one sees tabulated as "the frequencies of letters in English" or "the most common words in English" are actually statistics for
specific types of English
texts -- mainly novels and newspaper articles. But the numbers for other texts may deviate considerably from those.
Because, again, the frequency of a character or digraph in a text is determined by the frequency of the words that have that character or digraph; and the frequencies of words depend on the topic and style of the text. In a succinct herbal, the word "herb" may occur much more often than "the", and the digraph "rb" may be more frequent than "th".
Quote:"the frequency of 'e' in English" is effectively saying "the frequency of 'e' in English texts on average" ... There is still a distribution of frequencies from all existing texts
It is not clear at all what "average" and "all existing texts" mean there. Should one include all emails, stock reports, advertisements, tweets, online catalogs, cash register slips? All texts created in the last 5 years, or the last 500 years? Should one count a book printed in a million copies as one text, or a million?
But anyway, that question is irrelevant. When comparing statistics of the VMS, one should use texts of other languages that hopefully are of the same nature. Which is totally
not going to be like the "average" text, in any language.
The Herbal section should at least be compared to other Medieval herbals -- not to novels or theological treatises.
Even among herbals, even over the same herbs, there may be huge differences in word frequencies, due to differences in styles. Compare
"Herba Comica: This herb is good for mumps and ingrown toenails, if it is
taken as infusion for two weeks, twice daily. If it is taken in excess, it
will cause the belly button to turn green."
"Laughwort. Cures: mumps, ingrown toenails. Preparation: Tea. Dosage:
one cup, 2 times per day, for 14 days. Effects of abuse: green belly button."
Note that the words "is" and "it" appear 3 times each in the first entry, but not once in the second one.
Quote:The frequency of the word written "t h e" is significantly higher in an average english text compared with that of an average french text.
Well, "thé" means "tea" in French. So that word may actually be more common in the average French coffeehouse menu than "the" in the average English coffeehouse menu...
All the best, --stolfi