Bayesian latent class models for the multiple imputation of categorical data

Davide Vidotto*, Jeroen K. Vermunt, Katrijn Van Deun

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

7 Citations (Scopus)
331 Downloads (Pure)


Latent class analysis has beer recently proposed for the multiple imputation (MI) of missing categorical data, using either a standard frequentist approach or a nonparametric Bayesian model called Dirichlet process mixture of multinomial distributions (DPMM). The main advantage of using a latent class model for multiple imputation is that it is very flexible in the sense that it car capture complex relationships in the data given that the number of latent classes is large enough. However, the two existing approaches also have certain disadvantages. The frequentist approach is computationally demanding because it requires estimating many LC models: first models with different number of classes should be estimated to determine the required number of classes and subsequently the selected model is reestimated for multiple bootstrap samples to take into account parameter uncertainty during the imputation stage. Whereas the Bayesian. Dirichlet process models perform the model selection and the handling of the parameter uncertainty automatically, the disadvantage of this method is that it tends to use a too small number of clusters during the Gibbs sampling, leading to an underfitting model yielding invalid imputations. In this paper, we propose an alternative approach which combined the strengths of the two existing approaches; that is, we use the Bayesian standard latent class model as an imputation model. We show how model selection can be performed prior to the imputation step using a single run of the Gibbs sampler and, moreover, show how underfitting is prevented by using large values for the hyperparameters of the mixture weights. The results of two simulation studies and one real-data study indicate that with a proper setting of the prior distributions, the Bayesian latent class model yields valid imputations and outperforms competing methods.

Original languageEnglish
Pages (from-to)56-68
JournalMethodology: European Journal of Research Methods for the Behavioral and Social Sciences
Issue number2
Publication statusPublished - 2018


  • Bayesian mixture models
  • latent class models
  • missing data
  • multiple imputation


Dive into the research topics of 'Bayesian latent class models for the multiple imputation of categorical data'. Together they form a unique fingerprint.

Cite this