(22-06-2018, 01:59 PM)bi3mw Wrote: You are not allowed to view links. Register or Login to view.I mostly use Bash scripts for searches. Finding strings across word boundaries is possible, but it's difficult to test the hits against the percentage ratio inside / outside the word boundaries. I will think about it.
Uhm, re-reading what I wrote, I think I have been particularly obscure. I guess my reference to a percentage shift was unnecessarily confusing.
I meant that you are now matching for perfect position alignment.
Comparing these two single-line pages:
Page 1: x
a b
f x x
Page 2: e
a f e y y
If I understood correctly, you currently match the two words in position 2 (the two 'a's).
You could also allow the two 'f's to match, since their position difference is just 1. Just an idea, anyway. There always are so many interesting options to investigate!
I have run some more experiments limited to exact matches between You are not allowed to view links.
Register or
Login to view. and f84v.
In the basic comparison, I find the same 9 matches as bi3mw.
(the number is word position inside the page)
17 qokeey qokeey
27 or or
36 ol ol
44 qokeedy qokeedy
74 qokedy qokedy
119 qokey qokey
235 daiin daiin
301 ol ol
305 lshedy lshedy
I have run 100 tests, randomly changing the order of f84v. Stats for the number of matches:
min:0 max:10 average:3.98
This confirms that 9 matches in the basic comparison is high, but it also says that it could still be casual.
I also tried adding a single word at the beginning of each of the two files, in order to compare words with a 1-position shift in both directions.
With 1 word added to f84r, I find 4 matches:
104 daiin daiin
145 ol ol
223 shedy shedy
282 shey shey
With 1 word added to f84v, I find 10 matches:
43 qokedy qokedy
82 okedy okedy
106 checthy checthy
163 shedy shedy
168 okedy okedy
232 qokedy qokedy
258 ol ol
283 shey shey
286 shedy shedy
299 qokeedy qokeedy
These two results are inconclusive: 4 is the average number resulting from a random sort, but 10 is the absolute maximum I have observed.