20-09-2016, 11:55 AM
Diane, thank you. Neal's regex is similar to the kind of thing I would like to build myself. I've managed a very wordy version of how words are built, but nothing quite so simple.
That's true, presumably due to case endings. I suppose we need multiple tests to show whether two characters are alike or not.
(20-09-2016, 08:34 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.In general, I think the problem of defining word structure (similarly to Philip Neal's regex) is contiguous but not identical to the identification of "character classes". For instance, in Latin the phonetically similar "n" and "m" have very different positional statistics. The two letters have roughly the same number of occurrences, but "n" appears as the last letter in about 1% of the words that contain it, while "m" appears as the last letter in about 50% of the words with at least an "m".
That's true, presumably due to case endings. I suppose we need multiple tests to show whether two characters are alike or not.

