Mark Knowles > 24-09-2025, 01:42 PM
Mark Knowles > 24-09-2025, 09:26 PM
Mark Knowles > 24-09-2025, 09:51 PM
Mauro > 25-09-2025, 12:40 PM
(24-09-2025, 09:51 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.A question that I am keen to answer is on average how many examples of repeated words should there be in a typical Latin manuscript of the period. I am inclined to assume that the answer is very few.
Jorge_Stolfi > 25-09-2025, 12:52 PM
(25-09-2025, 12:40 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I made a test with De Bello Gallico: it has 11030 word types, of which 6314 are hapax legomena. The whole text is 51503 words long, so most of the text (88%) would be culled by your procedure, but only ~43% of the vocabulary.
Mark Knowles > 25-09-2025, 03:18 PM
(25-09-2025, 12:52 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(25-09-2025, 12:40 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I made a test with De Bello Gallico: it has 11030 word types, of which 6314 are hapax legomena. The whole text is 51503 words long, so most of the text (88%) would be culled by your procedure, but only ~43% of the vocabulary.
I understood that Mark considers deleting words that are repeated in consecutive positions.
Mauro > 25-09-2025, 08:46 PM
(25-09-2025, 03:18 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.(25-09-2025, 12:52 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.(25-09-2025, 12:40 PM)Mauro Wrote: You are not allowed to view links. Register or Login to view.I made a test with De Bello Gallico: it has 11030 word types, of which 6314 are hapax legomena. The whole text is 51503 words long, so most of the text (88%) would be culled by your procedure, but only ~43% of the vocabulary.
I understood that Mark considers deleting words that are repeated in consecutive positions.
Exactly. The Voynich manuscript has many words that repeat consecutively, maybe twice or three times in succession. I suspect that such words are very very likely to be null or filler words, so I am curious as to what the Voynich text might look like with all such words removed.
By repeated words I don't mean words that occur more than once in the manuscript; they must be words that are written next to each other in sequence. Once some words are removed there may be other words that are repeated next to each other, so they must be removed and this process repeated until there are no words in the Voynich text that are repeated in succession. I may implement this myself, but I wanted to put it down as a thought first.
anyasophira > 26-09-2025, 07:06 AM
Jorge_Stolfi > 26-09-2025, 07:28 AM
(24-09-2025, 09:51 PM)Mark Knowles Wrote: You are not allowed to view links. Register or Login to view.A question that I am keen to answer is on average how many examples of repeated words should there be in a typical Latin manuscript of the period. I am inclined to assume that the answer is very few.
quimqu > 26-09-2025, 08:13 AM