Emma May Smith > 29-02-2016, 07:49 PM
ReneZ > 01-03-2016, 05:09 AM
Sam G > 01-03-2016, 08:37 AM
-Job- > 01-03-2016, 09:16 AM
Davidsch > 01-03-2016, 01:21 PM
Quote:ReneZ: - whatever it is that causes the first word in each line to be different
Emma May Smith > 01-03-2016, 08:15 PM
(01-03-2016, 09:16 AM)-Job- Wrote: You are not allowed to view links. Register or Login to view.It's possible to determine the subset of characters which maximizes repeated word sequences, but the right questions would need to be asked.
For example we can expect that a smaller subset will typically result in more repeated sequences, so it would be preferable to ask which subset of n characters yields the most repetition, starting with large n.
There is a trade off between subset size and the number of repeated sequences - the former should not be too small and the latter should not be too large.
It's not clear how the results would be interpreted. I suspect the brute-force search would be the easy part.
-Job- > 02-03-2016, 10:07 AM
ReneZ > 02-03-2016, 11:22 AM
-Job- > 03-03-2016, 10:34 AM
(02-03-2016, 11:22 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.That's an interesting statistic. How many words were in the text used? I guess you counted all overlapping sequences, so that the total nuber of 4-word sequences is N(word)-3.
(02-03-2016, 11:22 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The next question would be what would be a 'normal' number for a known plain text.
(02-03-2016, 11:22 AM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.And the next.... since such a text would be edited and spell-checked, what would happen if one intruduces arbitrary errors in this text, e.g. one arbitrary substitution every 80, 40 or 20 characters.....
ReneZ > 03-03-2016, 11:16 AM