(08-11-2025, 12:21 AM)Rafal Wrote: You are not allowed to view links. Register or Login to view.Are you able to interpret pca1 and pca2, the new 2 dimensions that emerged from your analysis? What features of text do they describe?
Yes, I won't bother you with numbers, but if we check the components we can say something like this:
For Directed graphs (A to B) (I attach again the plots for better understanding):
[
attachment=12099]
High PCA1: networks where words link back and forth easily, short cycles, high clustering and entropy. More symmetric flow of information.
Low PCA1: networks with strong directional chains, longer paths, less reciprocity. More syntactically constrained, hierarchical language flow.
PCA1 measures the degree of reciprocity vs. hierarchy in the word graph: how balanced the directional connections are.
High PCA2: corresponds to more fragmented, less cohesive structures
Low PCA2: means dense, compact graphs with a strong central core and tightly connected components.
PCA2 describes how compact or fragmented the word network is: low values mean a dense, cohesive core, while high values indicate a loose, fragmented structure.
For co-ocurrence graphs (window of 5 tokens connections):
[
attachment=12100]
High PCA1: high reciprocity, high clustering, steeper Zipf slope. Words tend to co-occur in repeated local patterns (dense and regular).
Low PCA1: high type–token ratio, strong modularity, high degree assortativity. More lexical variety and topic segmentation.
In this case, PCA1 captures lexical diversity vs. repetition and regularity in the co-occurrence network.It measures whether a text’s structure is broad and modular (many distinct word groups) or tight and repetitive (dense clusters of recurring pairs).
High PCA2: more evenly distributed connections, higher resilience, and longer paths. The network tolerates removal of nodes without breaking apart.
Low PCA2: high inequality of degree (large gini_deg), meaning a few dominant hubs control most co-occurrences.
PCA2 describes core centralization vs. distributed connectivity: how much the network depends on key hubs.