quimqu > 18-09-2025, 11:31 AM
| Corpus | Distance | Tokens |
|---|---|---|
| Alchemical herbal (Latin) | 0.327 | 6,536 |
| De Docta Ignorantia (Latin) | 0.374 | 37,121 |
| Tirant lo Blanc (Catalan) | 0.395 | 419,309 |
| La Reine Margot (French) | 0.396 | 112,803 |
| Ambrosius Medionalensis (Latin) | 0.402 | 117,734 |
| El Lazarillo de Tormes (Spanish) | 0.403 | 20,060 |
| Simplicius Simplicissimus (German) | 0.415 | 189,804 |
| Romeo and Juliet (English) | 0.451 | 24,822 |
| The English Physician (Culpepper) (English) | 0.460 | 135,362 |
Jorge_Stolfi > 18-09-2025, 12:15 PM
(18-09-2025, 11:31 AM)quimqu Wrote: You are not allowed to view links. Register or Login to view.In other words: at position 1 we might have 40% of one character, 30% of another, 20% of a third, and so on. Just the shape of the distribution, not the labels.
quimqu > 18-09-2025, 02:40 PM
(18-09-2025, 12:15 PM)Jorge_Stolfi Wrote: You are not allowed to view links. Register or Login to view.I don't put much faith in this kind of analysis...
Jorge_Stolfi > 19-09-2025, 01:24 AM
(18-09-2025, 08:41 PM)quimqu Wrote: You are not allowed to view links. Register or Login to view.Do you think we cannot squeeze the transliterations anymore?