TY - JOUR
T1 - Detecting outlying variables in multigroup data
T2 - A comparison of different loading similarity coefficients
AU - Gvaladze, Sopiko
AU - De Roover, Kim
AU - Tuerlinckx, Francis
AU - Ceulemans, Eva
N1 - Funding Information:
The research leading to the results reported in this paper was supported by the Research Fund of KU Leuven (C14/19/054). For the simulations, we used the infrastructure of the VSC?Flemish Supercomputer Center, funded by the Hercules Foundation and the Flemish Government?department EWI.
PY - 2021
Y1 - 2021
N2 - Multivariate multigroup data are collected in many fields of science, where the so-called groups pertain to, for instance, experimental groups or countries the participants are nested in. To summarize the main information in such data, principal component analysis (PCA) is highly popular. PCA reduces the variables to a few components that are linear combinations of the original variables. Researchers usually assume those components to be the same across the groups and aim to apply a simultaneous component analysis. To investigate whether this assumption is reasonable, one often analyzes the groups separately and computes a similarity index between the group-specific component loadings of the variables. In many cases, however, most variables have highly similar loadings across the groups, but a few variables, which we will call "outlying variables," behave differently, indicating that a simultaneous analysis is not warranted. In such cases, the outlying variables should be removed before proceeding with the simultaneous analysis. To do so, the variables are ranked according to their relative outlyingness. Although some procedures have been proposed that yield such an outlyingness ranking, they might not be optimal, because they all rely on the same choice of similarity coefficient without evaluating other alternatives. In this paper, we give an overview of other options and report extensive simulations in which we investigate how this choice affects the correctness of the outlyingness ranking. We also illustrate the added value of the outlying variable approach by means of sensometric data on different bread samples.
AB - Multivariate multigroup data are collected in many fields of science, where the so-called groups pertain to, for instance, experimental groups or countries the participants are nested in. To summarize the main information in such data, principal component analysis (PCA) is highly popular. PCA reduces the variables to a few components that are linear combinations of the original variables. Researchers usually assume those components to be the same across the groups and aim to apply a simultaneous component analysis. To investigate whether this assumption is reasonable, one often analyzes the groups separately and computes a similarity index between the group-specific component loadings of the variables. In many cases, however, most variables have highly similar loadings across the groups, but a few variables, which we will call "outlying variables," behave differently, indicating that a simultaneous analysis is not warranted. In such cases, the outlying variables should be removed before proceeding with the simultaneous analysis. To do so, the variables are ranked according to their relative outlyingness. Although some procedures have been proposed that yield such an outlyingness ranking, they might not be optimal, because they all rely on the same choice of similarity coefficient without evaluating other alternatives. In this paper, we give an overview of other options and report extensive simulations in which we investigate how this choice affects the correctness of the outlyingness ranking. We also illustrate the added value of the outlying variable approach by means of sensometric data on different bread samples.
KW - component similarity
KW - multivariate multigroup data
KW - PCA
KW - SCA
KW - Tucker's congruence
KW - SIMULTANEOUS COMPONENT ANALYSIS
KW - CONGRUENCE COEFFICIENTS
KW - ROTATION
KW - MULTIBLOCK
KW - MODELS
KW - NUMBER
KW - COMMON
UR - http://www.scopus.com/inward/record.url?scp=85082659714&partnerID=8YFLogxK
U2 - 10.1002/cem.3233
DO - 10.1002/cem.3233
M3 - Article
SN - 1099-128X
VL - 35
JO - Journal of Chemometrics
JF - Journal of Chemometrics
IS - 2
M1 - e3233
ER -