The Voynich Ninja

Full Version: [split] Percentage of word types that occur more than once
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
Thanks to RobGea for the post with verifiable numbers.

It confirms that, at least for this particular statistic, the Voynich MS is within the range of existing reference texts.
(23-06-2020, 08:32 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Interestingly we can see that Dante has the closest numbers to the VoynichMS.
This is all the more surprising as the Divine Comedy is a poem.
(23-06-2020, 04:50 PM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.This is also not unusual, at least not for non-English works... I found that the Swedish novel Inferno by August Strindberg has a number of 4.38 (total number of words about 46 000).
What's this ratio of? hapaxes or word types?

(23-06-2020, 08:37 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Thanks to RobGea for the post with verifiable numbers.

It confirms that, at least for this particular statistic, the Voynich MS is within the range of existing reference texts.
The Pliny reference is very interesting and good potential point of comparison, but Latin is a heavily inflected language, which increases the number of unique words. As far as I know, all attempts to find inflection in Voynichese have resulted in failure however.
(24-06-2020, 01:34 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
Quote:This is also not unusual, at least not for non-English works... I found that the Swedish novel Inferno by August Strindberg has a number of 4.38 (total number of words about 46 000).
What's this ratio of? hapaxes or word types?

Not hapaxes. This is just the ratio of total number of words to word-types (in other words the inverse of Type-Token Ratio).
(24-06-2020, 04:48 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2020, 01:34 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
Quote:This is also not unusual, at least not for non-English works... I found that the Swedish novel Inferno by August Strindberg has a number of 4.38 (total number of words about 46 000).
What's this ratio of? hapaxes or word types?

Not hapaxes. This is just the ratio of total number of words to word-types (in other words the inverse of Type-Token Ratio).
Thanks. Was this done on the French original or the English translation?
(24-06-2020, 05:31 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2020, 04:48 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2020, 01:34 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
Quote:This is also not unusual, at least not for non-English works... I found that the Swedish novel Inferno by August Strindberg has a number of 4.38 (total number of words about 46 000).
What's this ratio of? hapaxes or word types?

Not hapaxes. This is just the ratio of total number of words to word-types (in other words the inverse of Type-Token Ratio).
Thanks. Was this done on the French original or the English translation?

Not the French original, but the Swedish translation (by Eugène Fahlstedt).
(24-06-2020, 05:41 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Not the French original, but the Swedish translation (by Eugène Fahlstedt).
OK, found a copy You are not allowed to view links. Register or Login to view.. I'm a bit surprised at the statistic, however, since my Swedish is at the B2 level and yet I can read most of the text in the first couple of pages (making allowances for the older orthography). I wonder if the French and English versions have similar stats?
(24-06-2020, 01:34 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
(23-06-2020, 08:37 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Thanks to RobGea for the post with verifiable numbers.

It confirms that, at least for this particular statistic, the Voynich MS is within the range of existing reference texts.
The Pliny reference is very interesting and good potential point of comparison, but Latin is a heavily inflected language, which increases the number of unique words. As far as I know, all attempts to find inflection in Voynichese have resulted in failure however.

The number of one token words will vary depending on whether the vocabulary has been lemmatised or not. Which is why we see lower figures for English than for Latin, because one has more inflection, as you note. We're safer working with non-lemmatised vocabularies as we don't know about this feature in the Voynich text. You could simulate lemmatisation by, for example, counting all words beginning [q] as being variants of the same word with [q] removed.

But as Rene said, the number of words only occurring once isn't outside the bounds of natural language. Only if we could prove that the Voynich language was non-inflected might this raise a concern. I don't know how we could prove such a thing at our current state of knowledge.
All this is based on the assumption that the Voynich MS words are the equivalents of complete words in some natural language.

This is a most natural assumption that is made almost automatically by most people (at least by most people presenting solutions) but I am not at all sure that it is correct.

There are arguments in favour of it and arguments against it, but both types are rather weak.

In favour: adherence (more or less) to Zipf's law
Against: unusual distribution of repeating word sequences.
(24-06-2020, 07:34 AM)Stephen Carlson Wrote: You are not allowed to view links. Register or Login to view.
(24-06-2020, 05:41 AM)Alin_J Wrote: You are not allowed to view links. Register or Login to view.Not the French original, but the Swedish translation (by Eugène Fahlstedt).
OK, found a copy You are not allowed to view links. Register or Login to view.. I'm a bit surprised at the statistic, however, since my Swedish is at the B2 level and yet I can read most of the text in the first couple of pages (making allowances for the older orthography). I wonder if the French and English versions have similar stats?

Thanks for that link. By the way, this version seems to be written using older Swedish spelling, for example "af" instead of "av" and "hvilken" instead of "vilken" (the version I downloaded from Project Gutenberg uses the modern spelling). This should not BTW affect the statistics much.
Pages: 1 2 3 4