kckluge > Yesterday, 04:05 AM
Koen G > Yesterday, 10:55 AM
MarcoP > Yesterday, 01:19 PM
Quote:dimensionality reduction using a heuristic PCA-like method rather than PCA
kckluge > 6 hours ago
(Yesterday, 01:19 PM)MarcoP Wrote: You are not allowed to view links. Register or Login to view.Quote:dimensionality reduction using a heuristic PCA-like method rather than PCA
Why not using PCA? That would make for a more meaningful comparison with Rene's results
Quote:The following procedure will not necessarily find this maximum, but it will find something near to the maximum.
Now, the coefficient along this base vector may be computed for each page vector. The contribution of this base vector may then be subtracted from each of the page vector, meaning that the hypercloud collapses to a space with dimension one less than before.
- Locate the bigram which has the most varying distribution over all pages. This means: locate the component of all vectors which has the largest rms about mean.
- For all bigrams, find out their covariance with this bigram. This means, estimate a linear relationship (without offset) which best fits the value of each component vs. the value of the component found above. Save the slope of this linear relationship.
- The vector containing all these slopes has the value 1 at the component found above, and values between -1 and 1 at all other components. Normalise this vector and the base vector for the prime direction has been found.
After this last step, the procedure may be repeated, and it will automatically find the next most important base vector, which will be perpendicular to the first. This whole procedure may be repeated several times.
kckluge > 4 hours ago
(Yesterday, 10:55 AM)Koen G Wrote: You are not allowed to view links. Register or Login to view.Very interesting and comprehensible analysis, thank you. Quire 8 pages misbehaving comes as no surprise.
I wonder if the cloud of Zodiac pages can be seen as somewhat bridging the gap? Maybe it behaves differently because it's purely circular text?
Likely, which pages of Pharma and Stars are part of the Herbal A cloud? Might this also have something to do with the ratio of paragraph text vs. labels/circular text?
ReneZ > 3 hours ago
(Yesterday, 04:05 AM)kckluge Wrote: You are not allowed to view links. Register or Login to view.It is important to be careful about drawing conclusions from linear projections of higher dimensional data onto lower dimensional spaces. If two clumps of points are separable in the lower dimensional projection then they are also separable in the full dimensional space, but the inverse is not true -- two clumps of points that overlap in some projection do not necessarily overlap in the full space.