Wladimir D > 04-03-2018, 08:08 PM
Anton > 04-03-2018, 08:45 PM
Wladimir D > 04-03-2018, 09:28 PM
Wladimir D > 06-03-2018, 03:40 PM
Davidsch > 06-03-2018, 04:33 PM
Wladimir D > 06-03-2018, 05:26 PM
(06-03-2018, 04:33 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.let me start by: the idea is a good one, but:
>>N - is the number of identical words.
You are writing here that the words found on both pages, compared is only N ?
So if you compare 104V and 115r there are only 93 words the same?
I do not have my data now, but is seems very low. What is a 'word' in your perception, 2 long or 1 character or...?
And do you count between the unique words MS-overall, or in the group uniques per page? (I assume you did that, cause it gives the lowest number)
Wladimir D > 10-03-2018, 07:57 PM
Davidsch > 11-03-2018, 05:00 PM
Wladimir D > 11-03-2018, 06:11 PM
(11-03-2018, 05:00 PM)Davidsch Wrote: You are not allowed to view links. Register or Login to view.That is extremely high, but again: it depends very much of the chosen 'groups' or 'words'. If you use small groups as or and ol, of course there is a 50% or higher overlap.
And on the other hand, it is unclear what you calculate, if you take 98% of all groups, and find them in one page and compare those with the groups in another page,
the difference that you get is simply the difference in counting those groups.