ReneZ > Today, 12:52 AM
MarcoP > 7 hours ago
u/Miseryy Wrote:Your UMAP is massively overfit. In general tight strings that curve around are just indicative of a very small # neighbors used and too small distance threshold. You can replicate this effect with ~any dataset.
Also, I'm of the opinion you should never do clustering on UMAP, ever. Furthermore "UMAP clustering" isn't a noun that exists. UMAP can be used as an initial preprocessing step, and then a standard clustering algorithm can be used. But again, I think it's terrible methodology, since you can ~always tune UMAP to achieve the clusters you want in the first place. People do it though, no denying that.
I'd suggest going with the default parameters unless you really know what you're doing and have a good justification (read: mathematical reason) to adjust them. The parameters affect the math.
Don't focus to much on meaning. If you want meaning, use PCA, and look at the vectors. The only interpretable meaning of UMAP is relative positioning. And even that is sketchy. You really should be taking away: there are groups that can be visually separated and appear to be distinct. UMAP is not proof of anything
I would recommend that you just do clustering on your data. Not "UMAP clustering". How about starting with a simple hierarchical clustering and then looking at what you get? You can cluster either across genes or samples, and observe what falls into what group.
quimqu > 1 hour ago