17-03-2025, 05:32 PM
(17-03-2025, 08:58 AM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.But many of the inferred spaces do not match the actual text; an obvious problem is yq which is never detected as a break in the sequence, since in Q13 it’s a frequent bigram when spaces are ignored.
There are thousands of words that only happen a few times and are significantly longer than average. I found that these words were around twice as likely to have an l, s, or r in the middle of the word, but most notably were 13 times as likely to have a y, which makes me think that y is definitely positional or a separator, if it wasn't already obvious by 40-45% of words ending in it.
I would consider trying the experiment again with the letter y removed.