TY - JOUR
T1 - The impact of ordinal scales on Gaussian mixture recovery
AU - Haslbeck, Jonas M.B.
AU - Vermunt, Jeroen K.
AU - Waldorp, Lourens J.
N1 - Funding Information: This work was supported by NWO Vici grant nr. 181.029.
PY - 2023
Y1 - 2023
N2 - Gaussian mixture models (GMMs) are a popular and versatile tool for exploring heterogeneity in multivariate continuous data. Arguably the most popular way to estimate GMMs is via the expectation–maximization (EM) algorithm combined with model selection using the Bayesian information criterion (BIC). If the GMM is correctly specified, this estimation procedure has been demonstrated to have high recovery performance. However, in many situations, the data are not continuous but ordinal, for example when assessing symptom severity in medical data or modeling the responses in a survey. For such situations, it is unknown how well the EM algorithm and the BIC perform in GMM recovery. In the present paper, we investigate this question by simulating data from various GMMs, thresholding them in ordinal categories and evaluating recovery performance. We show that the number of components can be estimated reliably if the number of ordinal categories and the number of variables is high enough. However, the estimates of the parameters of the component models are biased independent of sample size. Finally, we discuss alternative modeling approaches which might be adopted for the situations in which estimating a GMM is not acceptable.
AB - Gaussian mixture models (GMMs) are a popular and versatile tool for exploring heterogeneity in multivariate continuous data. Arguably the most popular way to estimate GMMs is via the expectation–maximization (EM) algorithm combined with model selection using the Bayesian information criterion (BIC). If the GMM is correctly specified, this estimation procedure has been demonstrated to have high recovery performance. However, in many situations, the data are not continuous but ordinal, for example when assessing symptom severity in medical data or modeling the responses in a survey. For such situations, it is unknown how well the EM algorithm and the BIC perform in GMM recovery. In the present paper, we investigate this question by simulating data from various GMMs, thresholding them in ordinal categories and evaluating recovery performance. We show that the number of components can be estimated reliably if the number of ordinal categories and the number of variables is high enough. However, the estimates of the parameters of the component models are biased independent of sample size. Finally, we discuss alternative modeling approaches which might be adopted for the situations in which estimating a GMM is not acceptable.
KW - Gaussian Mixture Modeling
KW - Misspecification
KW - Mixture modeling
KW - Model selection
KW - Ordinal scales
UR - http://www.scopus.com/inward/record.url?scp=85134327496&partnerID=8YFLogxK
U2 - 10.3758/s13428-022-01883-8
DO - 10.3758/s13428-022-01883-8
M3 - Article
C2 - 35831565
AN - SCOPUS:85134327496
SN - 1554-351X
VL - 55
SP - 2143
EP - 2156
JO - Behavior Research Methods
JF - Behavior Research Methods
IS - 4
ER -