![]() |
|
Vocabulary size by Illustration Type - Printable Version +- The Voynich Ninja (https://www.voynich.ninja) +-- Forum: Voynich Research (https://www.voynich.ninja/forum-27.html) +--- Forum: Analysis of the text (https://www.voynich.ninja/forum-41.html) +--- Thread: Vocabulary size by Illustration Type (/thread-3829.html) Pages:
1
2
|
Vocabulary size by Illustration Type - RobGea - 03-07-2022 Vocabulary size by Illustration Type (Using slightly modified ZL2a transcription, uncertain spaces as spaces) Any and all Errors are mine, the folllowing description sounds more complicated than it is .In the EVA format there is a variable $I for Illustration type. The Herbal Type was further split into 2 types, Herbal_a and Herbal_b following LisaFaginDavis allocation of folios by Scribe. Herbal_a is defined as having EVA $I = H and its folio is ascribed to Scribe_1. Herbal_b is defined as having EVA $I = H and its folio is ascribed to any Scribe except Scribe_1. Here the words in the folios of the same Illustration type were collected giving a total word count for each of the 9 types. Within each type, replicated words were removed, creating a set of words where each word is counted once, this is the vocabulary of that Illustration type, the 'type_vocab'. Then for each word in the 'type_vocab', if that word apppeared in any the other 8 type_vocab's , the word was removed. creating an 'unshared_vocab' The 'type_vocab' contains the words that appear once or more in folios that have the same Illustration type. The 'unshared_vocab' contains words that appear once or more ONLY in folios that have the same Illustration type. Any word that appears in more than one 'type_vocab' is removed completely. For instance the word 'daiin' appears in several 'type_vocab's and because of that it does not appear in any of the 'unshared_vocab's. Key: Herbal_a ( Ha ); Herbal_b ( Hb ); Stars ( S ); Balneo ( B ); Pharma ( P ); Astro ( A ); Zodiac ( Z ); Text ( T ); Cosmo ( C ). Code: Type, total_words, type_vocab, unshared_vocab, unshared_vocab as % of type_vocab, RankObservations: -HerbalA has the most unshared words, as expected because it is CurrierA. -Pharma is also CurrierA so its position at R5 is unexpected. -Stars at R2 is 10% higher than the next rank, an anomaly with no obvious explanation. Speculations: One possibility is the Stars section is discussing something that is outside the range of the rest of the text. RE: Vocabulary size by Illustration Type - Ruby Novacna - 03-07-2022 (03-07-2022, 03:30 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.-Stars at R2R3 ? (03-07-2022, 03:30 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.The Herbal Type was further split into 2 typesShouldn't the text pages also be split between A and B? RE: Vocabulary size by Illustration Type - Torsten - 03-07-2022 (03-07-2022, 03:30 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Observations: You should be aware that you compare dictionary sizes. If you take the text size into account the numbers tell a slightly different story: Code: Type, total_words, type_vocab, unshared_vocab, type_vocab as % of total_words, unshared_vocab as % of total_wordsRE: Vocabulary size by Illustration Type - RobGea - 03-07-2022 Thanks Torsten, doing it that way does indeed show a different story. Thats pretty interesting, i need to think this over. RE: Vocabulary size by Illustration Type - ReneZ - 04-07-2022 Between the two views: - ignoring sample text length - dividing by sample text length both views are imperfect. It is not clear to me which one of the two is the more indicative one. Certainly, dictionary size increases very non-linearly with sample text size. It is only a problem when text lenghts are significantly different. That is the case here of course. RE: Vocabulary size by Illustration Type - Aga Tentakulus - 04-07-2022 Hmmmm..... Never ask a brother if he is a professor of mathematics and computer science. Educate yourself and ask again later. You are not allowed to view links. Register or Login to view. RE: Vocabulary size by Illustration Type - R. Sale - 04-07-2022 Can you list the unique vords of Herbal A and determine a frequency of use for those that were used multiple times within that section? In other words, is there a specific set of vords that uniquely define the "topics" of Herbal A. Can the herbals be combined and then combined with pharma of look for unique common terms not found in other parts of the VMs? If the whole botany and Pharma bit were all about leaves, then that would be a shared term probably not used in cosmic and zodiac parts. If terms unique to Herbal A are used multiple times, is each use similar or unique? And also in the combinations? RE: Vocabulary size by Illustration Type - Torsten - 05-07-2022 (04-07-2022, 10:26 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.Can you list the unique vords of Herbal A and determine a frequency of use for those that were used multiple times within that section? In other words, is there a specific set of vords that uniquely define the "topics" of Herbal A. Usually such vords are rarely used. Herbal A (1426 unshared word types Note: The text samples were taken from Takahashi's transliteration) Code: 1262 vords only occur onceQuire 13 Bio (634 unshared word type) Code: 581 vords only occur onceQuire 20 Stars (1663 unshared word types) Code: 1496 vords only occur onceRE: Vocabulary size by Illustration Type - Ruby Novacna - 05-07-2022 (03-07-2022, 03:30 PM)RobGea Wrote: You are not allowed to view links. Register or Login to view.Vocabulary size by Illustration TypeAs I don't usually do statistical calculations, I find it hard to follow: what precise point should this calculation of unshared words clarify? I couldn't find an explanation before the presentation of the results. RE: Vocabulary size by Illustration Type - Torsten - 05-07-2022 (04-07-2022, 10:26 PM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.Can the herbals be combined and then combined with pharma of look for unique common terms not found in other parts of the VMs? If the whole botany and Pharma bit were all about leaves, then that would be a shared term probably not used in cosmic and zodiac parts. If you combine Herbal A with Herbal B or Herbal A with Pharma no word stands out. If illustrations do indicate topics, some common terms specific to a particular type of illustration or topic should exist. However such terms doesn't exist. Pharma A (459 unshared word types) Code: 438 x occurs only onceHerbal (A) + Pharma (A) (1940 unshared word types) Code: 1700 x occurs only onceHerbal B (419 unshared word types) Code: 408 x occurs only onceHerbal A + Herbal B (1887 unshared word types) Code: 1670 x occurs only once |