julian > 05-10-2016, 06:21 AM
(05-10-2016, 12:19 AM)R. Sale Wrote: You are not allowed to view links. Register or Login to view.So, here's my method - fast and dirty. I took a Latin dictionary, and made a note where I happened to see a whole column of words starting with the same four letters.
Results: circ, coll, comm, comp, conc, conf, cong, cons, cont, conv, disp, diss, ibus, inve, perp, pers, pert, prae, proc, prop, pros, quad, quat, quin, semi
A few with five letters: inter, super, trans
Some additional, more common, three letter possibilities: acc, des, dis, exc, exp, exs, ill, inc, ins, per, pro, rec, rep, res, sub, tri
nablator > 30-11-2017, 04:13 PM
(04-10-2016, 09:20 PM)julian Wrote: You are not allowed to view links. Register or Login to view.My Latin wordlist contains 14,000 words ... I'd love to get hold of more, but it's hard finding decent resources online: I've already plundered the obvious ones.William Whitaker's wordlist is a good resource. More than a million words
farmerjohn > 30-11-2017, 04:44 PM
(30-11-2017, 04:13 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Curiously it lacks some words that are found in all dictionaries. No idea why.
nablator > 30-11-2017, 05:38 PM
(30-11-2017, 04:44 PM)farmerjohn Wrote: You are not allowed to view links. Register or Login to view.(30-11-2017, 04:13 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.Curiously it lacks some words that are found in all dictionaries. No idea why.
Which ones for example?
-JKP- > 30-11-2017, 06:03 PM
nablator > 30-11-2017, 06:12 PM
julian > 30-11-2017, 06:22 PM
MarcoP > 30-11-2017, 08:07 PM
(30-11-2017, 06:12 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I am looking for a way to reward small improvements for the GA to chew on without selecting too much fake Latin. The idea is not to make a lorem ipsum generator. So I was thinking about something like this:
- count actual Latin words as correct with score = word length
- count common Latin bigrams and trigrams (a selection of 20-50 for example) as correct with score = a small percentage of their total length
Helmut Winkler > 30-11-2017, 08:45 PM
(04-10-2016, 10:44 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.(04-10-2016, 06:55 AM)julian Wrote: You are not allowed to view links. Register or Login to view.I'd welcome suggestions of Latin abbreviations, prefixes and suffixes that I could include in (or remove from) the list in A) above (which I gleaned mostly from d'Imperio's summary of Cappelli).
Hello Julian, have you considered having a look at the full Cappelli book?
You are not allowed to view links. Register or Login to view.
julian > 30-11-2017, 09:06 PM
(30-11-2017, 08:07 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.(30-11-2017, 06:12 PM)nablator Wrote: You are not allowed to view links. Register or Login to view.I am looking for a way to reward small improvements for the GA to chew on without selecting too much fake Latin. The idea is not to make a lorem ipsum generator. So I was thinking about something like this:
- count actual Latin words as correct with score = word length
- count common Latin bigrams and trigrams (a selection of 20-50 for example) as correct with score = a small percentage of their total length
The statistics of Voynichese make clear that it cannot be a simple substitution code for Latin. Independently from the specific language, I think it could be a good idea to consider frequency as well:
autem and super should have higher scores than seror and loris.