The impact of ordinal scales on Gaussian mixture recovery

Jonas M.B. Haslbeck*, Jeroen K. Vermunt, Lourens J. Waldorp

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)
82 Downloads (Pure)

Abstract

Gaussian mixture models (GMMs) are a popular and versatile tool for exploring heterogeneity in multivariate continuous data. Arguably the most popular way to estimate GMMs is via the expectation–maximization (EM) algorithm combined with model selection using the Bayesian information criterion (BIC). If the GMM is correctly specified, this estimation procedure has been demonstrated to have high recovery performance. However, in many situations, the data are not continuous but ordinal, for example when assessing symptom severity in medical data or modeling the responses in a survey. For such situations, it is unknown how well the EM algorithm and the BIC perform in GMM recovery. In the present paper, we investigate this question by simulating data from various GMMs, thresholding them in ordinal categories and evaluating recovery performance. We show that the number of components can be estimated reliably if the number of ordinal categories and the number of variables is high enough. However, the estimates of the parameters of the component models are biased independent of sample size. Finally, we discuss alternative modeling approaches which might be adopted for the situations in which estimating a GMM is not acceptable.

Original languageEnglish
Pages (from-to)2143-2156
JournalBehavior Research Methods
Volume55
Issue number4
DOIs
Publication statusPublished - 2023

Keywords

  • Gaussian Mixture Modeling
  • Misspecification
  • Mixture modeling
  • Model selection
  • Ordinal scales

Fingerprint

Dive into the research topics of 'The impact of ordinal scales on Gaussian mixture recovery'. Together they form a unique fingerprint.

Cite this