(18-03-2026, 03:32 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.The earlier result was a clearly statistical outcome. Is that still the case in the newer analysis, or is that more like an impression?
The key point is that what changed is not the data, but the definition of what I call the "core."
In the earlier analysis, the "core" was defined operationally as the first word of the burst (suposition A: the writting and generation of the burst started at the first burst word of the page). Under that definition, the result was clearly statistical: the first word tends to be rarer, simply because it is the earliest occurrence, not necessarily the generative source (suposition A may be wrong). In the newer analysis, the core is no longer fixed as the first word, but is instead selected based on how well it explains the other members of the burst.
To answer your question "Is that still the case in the newer analysis, or is that more like an impression?": it is still statistical. The difference is that now the statistic is computed over a different definition of core (different definitions of core). When we allow any member of the burst to be the core, and select it based on its relationship to the others, we observe that in a quite big number of cases the best candidate is not the first word, and is often more frequent. So this is not an impression, but a result that depends on how the core is defined and selected.
(18-03-2026, 03:32 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.Shouldn't the cases, where a second core word is a more frequentword, have been included in the overall statistics shown before? In that case they are still a minority.
In the earlier analysis, those cases were not treated as alternative cores, so they were not counted in that way. In the newer analysis, they are explicitly considered..
To answer you: yes, they are still a minority. In some bursts, the first word remains a reasonable candidate. However, there is a consistent subset of cases where another word provides a better explanation of the rest of the burst, and this subset is large enough to be statistically visible, not just anecdotal.
So the result is something like "the first word is not always the best core, and this happens often enough to matter". This is why I say the newer result is still statistical: it is based on counting how often the best core differs from the first word, under a different definition of core.
(18-03-2026, 03:32 PM)ReneZ Wrote: You are not allowed to view links. Register or Login to view.This depends how the search was done. After having completed the analysis of any (potential) core word, is the next word to be tested the word immediately following it, or is the search continued at the end of the string of core with variants?
In the analysis I showed yesterday, the search was sequential: starting from a word, building a burst around it, and then continuing after that burst. So in that sense it corresponds to your first option.
This means that each word is only used once as a starting point, and words inside a burst are not re-tested as independent cores. That is why the first word naturally becomes the reference in that setup.
In the newer version, the idea is to relax that constraint and allow words inside the burst to be reconsidered as potential cores, which can lead to a different result.
As a side note (related to my post 5 minutes before your last one), when looking at the whole manuscript the structure seems much more interconnected, which suggests that this sequential approach may indeed be too restrictive.