Koen G > 23-06-2020, 02:01 PM
bi3mw > 23-06-2020, 02:55 PM
(23-06-2020, 01:12 PM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.5571 is more reasonable, but that's not the number on Rene's page. Where is the 5571 coming from?
Stephen Carlson > 23-06-2020, 03:57 PM
(23-06-2020, 02:55 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.Oh, you're referring to hapax legomena. I'm not talking about those, but that number is also surprising high in comparison with natural language texts.(23-06-2020, 01:12 PM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.5571 is more reasonable, but that's not the number on Rene's page. Where is the 5571 coming from?It comes from total word types - non unique word types: 8078 - 2507 = 5571
Alin_J > 23-06-2020, 04:16 PM
(21-06-2020, 08:33 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.During a study of the VMS, I found that only a relatively small percentage of Word Types occur more than once. Can anyone confirm this ?
31.034909631% of all Word Types in the VMS occur more than once.
44.928611163% of all Word Types in the comparison text ( Regimen Sanitatis ) occur more than once.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
bi3mw > 23-06-2020, 04:16 PM
(23-06-2020, 03:57 PM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.Oh, you're referring to hapax legomena. I'm not talking about those, but that number is also surprising high in comparison with natural language texts.
Alin_J > 23-06-2020, 04:30 PM
(23-06-2020, 04:16 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.(21-06-2020, 08:33 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.During a study of the VMS, I found that only a relatively small percentage of Word Types occur more than once. Can anyone confirm this ?
31.034909631% of all Word Types in the VMS occur more than once.
44.928611163% of all Word Types in the comparison text ( Regimen Sanitatis ) occur more than once.
You are not allowed to view links. Register or Login to view.
You are not allowed to view links. Register or Login to view.
If using the 101 transliteration (Glen Claston), the percentage of unique words (word types) that occurred more than once in the Voynich manuscript is about 28%. So, 72% of the word-types in the VMS are hapax legomenas.
Alin_J > 23-06-2020, 04:50 PM
(23-06-2020, 11:16 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.(22-06-2020, 04:07 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.For these two points, I can recommend to look at Table 3 on You are not allowed to view links. Register or Login to view. , that shows a great spread in the number of unique words (word types).From the linked page:
Quote:A representative number of word types may be 9,000 - 10,000.This strikes me on the high side, at least for certain languages and genres.
It's a kind of obvious test, but as anyone compared the unique word count of the VM to that of other works of various languages? This page here You are not allowed to view links. Register or Login to view. puts the number of words per unique words of 7 different English-language novels as between 9 and 16.5. If I understand the VM stats right, it comes in between 3.6 and 4.3, depending on the transcription. It seems that number of unique words in the denominator is about three or four times too high, but I'm curious about non-English works.
bi3mw > 23-06-2020, 05:02 PM
(23-06-2020, 04:30 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.But then again, this is IMO nothing unusual for natural language texts.Hmm, I would have rather thought that a ratio of 45% / 55% is the "normal case" in longer texts, but surely it depends strongly on the text genre and language.
Alin_J > 23-06-2020, 05:11 PM
(23-06-2020, 05:02 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.(23-06-2020, 04:30 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.But then again, this is IMO nothing unusual for natural language texts.Hmm, I would have rather thought that a ratio of 45% / 55% is the "normal case" in longer texts, but surely it depends strongly on the text genre and language.
RobGea > 23-06-2020, 08:32 PM
VoynichTT
Total words: 37759
Vocabulary : 8078
Hapax : 5571
68.8% of vocab is hapax 6.7 words per hapax Totalwords/Vocab ratio 6.64:1
-----------------------------------------------------------------------------------------------
la divina commedia di dante alighieri
Total words: 97344
Vocabulary : 19893
Hapax : 13750
69.1% of vocab is hapax 7.0 words per hapax Totalwords/Vocab ratio 4.89:1
--------------------------------------------------------------------------------------------------
Naturalis Historia books 1-4 pliny the elder (Thayer)
Total words: 35562
Vocabulary : 12596
Hapax : 8898
70.6% of vocab is hapax 3.9 words per hapax Totalwords/Vocab ratio 2.82:1
--------------------------------------------------------------------------------------------------
The Adventures of Tom Sawyer M.Twain
Total words: 71748
Vocabulary : 7578
Hapax : 3739
49.0% of vocab is hapax 19.1 words per hapax Totalwords/Vocab ratio 9.46:1
--------------------------------------------------------------------------------------------------
Tale of 2 cities C.Dickens
Total words: 136561
Vocabulary : 10137
Hapax : 4590
45.2% of vocab is hapax 29.7 words per hapax Totalwords/Vocab ratio 13.47:1