Network Scores

Alexander P. Christensen and Hudson Golino

2021-02-16

Introduction

This vignette shows you how to compute network scores using the state-of-the-art psychometric network algorithms in R. The vignette will walk through an example that compares latent network scores to latent variable scores computed by confirmatory factor analysis (CFA) and will explain the similarities and differences between the two.

To get started, a few packages need to be installed (if you don’t have them already) and loaded.

# Install 'NetworkToolbox' package
install.packages("NetworkToolbox")
# Install 'lavaan' package
install.packages("lavaan")
# Load packages
library(EGAnet)
library(NetworkToolbox)
library(lavaan)

Estimate Dimensions

To estimate dimensions, we’ll use exploratory graph analysis (Golino & Demetriou, 2017; Golino & Epskamp, 2017; Golino, Shi, et al., 2020). EGA first computes a network using either the graphical least absolute selection and shrinkage operator (GLASSO; Friedman, Hastie, & Tibshirani, 2008, 2014) with extended Bayesian information criterion (EBIC) from the R package qgraph (EBICglasso; Epskamp & Fried, 2018) or the triangulated maximally filtered graph (TMFG; Massara, Di Matteo, & Aste, 2016) from the NetworkToolbox package (Christensen, 2018). EGA then applies the Walktrap community detection algorithm (Pons & Latapy, 2006) from the igraph package (Csardi & Nepusz, 2006). Below is the code to estimate EGA using the EBICglasso method with the NEO PI-3 openness to experience data (n = 802) in the NetworkToolbox package.

# Run EGA
ega <- EGA(neoOpen, model = "glasso", algorithm = "louvain", plot.EGA = FALSE)

As depicted above, EGA estimates there to be 7 dimensions of openness to experience. Using these dimensions, we can now estimate network scores; but first, we’ll go into details about how these are estimated.

Network Loadings (Christensen & Golino, 2021)

Network loadings are roughly equivalent to factor loadings and differ only in the association measures used to compute them. For networks, the centrality measure node strength is used to compute the sum of the connections to a node. Previous simulation studies have reported that node strength is generally redundant with CFA factor loadings (Hallquist, Wright, & Molenaar, 2019). Importantly, Hallquist and colleagues (2019) found that a node’s strength represents a combination of dominant and cross-factor loadings. To mitigate this issue, I’ve developed a function called net.loads, which computes the node strength for each node in each dimension, parsing out the connections that represent dominant and cross-dimension loadings. Below is the code to compute standardized ($std; unstandardized, $unstd) network loadings.

# Standardized
n.loads <- net.loads(A = ega)$std

To provide mathematical notation, Let \(W\) represent a symmetric \(m \times m\) matrix where \(m\) is the number of terms. Node strength is then defined as:

\[S_i = \sum_{j = 1}^m |w_{ij}|,\]

\[L_{if} = \sum_{j \in f}^F |w_{ij}|,\]

where \(|w_{ij}|\) is the absolute weight (e.g., partial correlation) between node \(i\) and \(j\), \(S_i\) is the sum of the edge weights connected to node \(i\) across all nodes (\(n\); i.e., node strength for node \(i\)), \(L_{if}\) is the sum of edge weights in factor \(f\) that are connected to node \(i\) (i.e., node \(i\)’s loading for factor \(f\)), and \(F\) is the number of factors (in the network). This measure can be standardized using the following formula:

\[z_{L_{if}} = \frac{L_{if}}{\sqrt{\sum L_{.f}}},\]

where the denominator is equal to the square root of the sum of all the weights for nodes in factor \(f\). Notably, the standardized loadings are absolute weights with the signs being added after the loadings have been computed (following the same procedure as factor loadings; Comrey & Lee, 2013). In contrast to factor loadings, the network loadings are computed after the number of factors have been extracted from the network’s structure. Variables are deterministically assigned to specific factors via a community detection algorithm rather than the traditional factor analytic standard of their largest loading in the loading matrix. This means that some nodes may not have any connections to nodes in other factors in the network, leading some variables to have zeros for some factors in the network loading matrix.

These standardized network loadings summarize the information in the edge weights and so they depend on the type of association represented by the edge weight (e.g., partial correlations, zero-order correlations). Thus, the meaning of these network loadings will change based on the type of correlation used. Specifically, partial correlations represent the unique measurement of a factor whereas zero-order correlations represent the unique and shared contribution of a variable’s measurement of a factor.

Network Scores (Golino et al., 2020)

These network loadings form the foundation for computing network scores .Because the network loadings represent the middle ground between a saturated (EFA) and simple (CFA) structure, the network scores accommodate the inclusion of only the most important cross-loadings in their computation. This capitalizes on information often lost in typical CFA structures but reduces the cross-loadings of EFA structures to only the most important loadings.

To compute network scores, the following code can be used:

# Network scores
net.scores <- net.scores(data = neoOpen, A = ega)

The net.scores function will return three objects: scores, commCor, and loads. scores contain the network scores for each dimension and an overall score. commCor contains the partial correlations between the dimensions in the network (and with the overall score). Finally, loads will return the standardized network loadings described above. Below we detail the mathematical notation for computing network scores.

First, we take each community and identify items that do not have loadings on that community equal to zero:

\[z_{tf} = z_{NL_{i \in f}} \neq 0,\]

where \(z_{NL_{f}}\) is the standardized network loadings for community \(f\), and \(z_{tf}\) is the network loadings in community \(f\),that are not equal to zero. Next, \(z_{tf}\) is divided by the standard deviation of the corresponding items in the data, \(X\):

\[wei_{tc} = \frac{z_{tf}}{\sqrt{\frac{\sum_{i=1}^{t \in f} (X_i - \bar{X_t})^2}{n - 1}}},\]

where the denominator, \(\sqrt{\frac{\sum_{i=1}^{t \in f} (X_i - \bar{X_t})^2}{n - 1}}\), corresponds to the standard deviation of the items with non-zero network loadings in community \(f\), and \(wei_{tf}\) is the weight for the non-zero loadings in community \(f\). These can be further transformed into relative weights for each non-zero loading:

\[relWei_{tf} = \frac{wei_{t \in f}}{\sum^F_f wei_{t \in f}},\]

where \(\sum^F_f wei_{t \in f}\) is the sum of the weights in community \(f\), and \(relWei_{tf}\) is the relative weights for non-zero loadings in community \(f\). We then take these relative weights and multiply them to the corresponding items in the data, \(X_{t \in f}\), to obtain the community (e.g., factor) score:

\[\hat{\theta_f} = \sum\limits^F_{f} X_{t \in f} \times relWei_{t \in f},\]

where \(\hat{\theta_f}\) is the network score for community \(f\).

Comparison to CFA Scores

It’s important to note that CFA scores are typically computed using a simple structure (items only load on one factor) and regression techniques. Network scores, however, are computed using a complex structure and are a weighted composite rather than a latent factor.

# Latent variable scores
# Estimate CFA
cfa.net <- CFA(ega, estimator = "WLSMV", data = neoOpen)

# Compute latent variable scores
lv.scores <- lavPredict(cfa.net$fit)

# Initialize correlations vector
cors <- numeric(ega$n.dim)

# Compute correlations
for(i in 1:ega$n.dim)
{cors[i] <- cor(net.scores$std.scores[,i], lv.scores[,i], method = "spearman")}
Correlations
Factor 1 0.989
Factor 2 0.990
Factor 3 0.987
Factor 4 0.983
Factor 5 0.984
Factor 6 0.983

As shown in the table, the network scores strongly correlate with the latent variable scores. Because Spearman’s correlation was used, the orderings of the values take precedence. These large correlations between the scores reflect considerable redundancy between these scores.

References

Christensen, A. P. (2018). NetworkToolbox: Methods and measures for brain, cognitive, and psychometric network analysis in R. The R Journal, 10, 422–439. https://doi.org/10.32614/RJ-2018-065

Christensen, A. P., & Golino, H. (2021). On the equivalency of factor and network loadings. Behavior Research Methods. https://doi.org/10.3758/s13428-020-01500-6

Comrey, A. L., & Lee, H. B. (2013). A first course in factor analysis (2nd ed.). New York, NY: Psychology Press.

Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695, 1–9.

Epskamp, S., & Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23, 617–634. https://doi.org/10.1037/met0000167

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441. https://doi.org/10.1093/biostatistics/kxm045

Friedman, J., Hastie, T., & Tibshirani, R. (2014). glasso: Graphical lasso – estimation of Gaussian graphical models. Retrieved from https://CRAN.R-project.org/package=glasso

Golino, H., Christensen, A. P., Moulder, R., Kim, S., & Boker, S. M. (2020). Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections. PsyArXiv. https://doi.org/10.31234/osf.io/tfs7c

Golino, H., & Demetriou, A. (2017). Estimating the dimensionality of intelligence like data using Exploratory Graph Analysis. Intelligence, 62, 54–70. https://doi.org/10.1016/j.intell.2017.02.007

Golino, H., & Epskamp, S. (2017). Exploratory Graph Analysis: A new approach for estimating the number of dimensions in psychological research. PloS ONE, 12, e0174035. https://doi.org/10.1371/journal.pone.0174035

Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., … Martinez-Molina, A. (2020). Investigating the performance of Exploratory Graph Analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. Psychological Methods, 25, 292–320. https://doi.org/10.1037/met0000255

Hallquist, M., Wright, A. C. G., & Molenaar, P. C. M. (2019). Problems with centrality measures in psychopathology symptom networks: Why network psychometrics cannot escape psychometric theory. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2019.1640103

Massara, G. P., Di Matteo, T., & Aste, T. (2016). Network filtering for big data: Triangulated Maximally Filtered Graph. Journal of Complex Networks, 5, 161–178. https://doi.org/10.1093/comnet/cnw015

Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10, 191–218. https://doi.org/10.7155/jgaa.00185