A comparison of incomplete data methods for categorical data

D.W. van der Palm, L.A. van der Ark, J.K. Vermunt

Research output: Contribution to journalArticleScientificpeer-review

Abstract

We studied four methods for handling incomplete categorical data in statistical modeling: (1) maximum likelihood estimation of the statistical model with incomplete data, (2) multiple imputation using a loglinear model, (3) multiple imputation using a latent class model, (4) and multivariate imputation by chained equations. Each method has advantages and disadvantages, and it is unknown which method should be recommended to practitioners. We reviewed the merits of each method and investigated their effect on the bias and stability of parameter estimates and bias of the standard errors. We found that multiple imputation using a latent class model with many latent classes was the most promising method for handling incomplete categorical data, especially when the number of variables used in the imputation model is large.
Original languageEnglish
Pages (from-to)754-774
JournalStatistical Methods in Medical Research
Volume25
Issue number2
DOIs
Publication statusPublished - 2016

Fingerprint

Nominal or categorical data
Incomplete Data
Multiple Imputation
Latent Class Model
Imputation
Latent Class
Log-linear Models
Statistical Modeling
Statistical Models
Standard error
Maximum Likelihood Estimation
Statistical Model
Unknown
Estimate

Cite this

van der Palm, D.W. ; van der Ark, L.A. ; Vermunt, J.K. / A comparison of incomplete data methods for categorical data. In: Statistical Methods in Medical Research. 2016 ; Vol. 25, No. 2. pp. 754-774.
@article{c4c7bd8c54d94a018fccb3e90033e506,
title = "A comparison of incomplete data methods for categorical data",
abstract = "We studied four methods for handling incomplete categorical data in statistical modeling: (1) maximum likelihood estimation of the statistical model with incomplete data, (2) multiple imputation using a loglinear model, (3) multiple imputation using a latent class model, (4) and multivariate imputation by chained equations. Each method has advantages and disadvantages, and it is unknown which method should be recommended to practitioners. We reviewed the merits of each method and investigated their effect on the bias and stability of parameter estimates and bias of the standard errors. We found that multiple imputation using a latent class model with many latent classes was the most promising method for handling incomplete categorical data, especially when the number of variables used in the imputation model is large.",
author = "{van der Palm}, D.W. and {van der Ark}, L.A. and J.K. Vermunt",
year = "2016",
doi = "10.1177/0962280212465502",
language = "English",
volume = "25",
pages = "754--774",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "Sage Publications Ltd",
number = "2",

}

A comparison of incomplete data methods for categorical data. / van der Palm, D.W.; van der Ark, L.A.; Vermunt, J.K.

In: Statistical Methods in Medical Research, Vol. 25, No. 2, 2016, p. 754-774.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A comparison of incomplete data methods for categorical data

AU - van der Palm, D.W.

AU - van der Ark, L.A.

AU - Vermunt, J.K.

PY - 2016

Y1 - 2016

N2 - We studied four methods for handling incomplete categorical data in statistical modeling: (1) maximum likelihood estimation of the statistical model with incomplete data, (2) multiple imputation using a loglinear model, (3) multiple imputation using a latent class model, (4) and multivariate imputation by chained equations. Each method has advantages and disadvantages, and it is unknown which method should be recommended to practitioners. We reviewed the merits of each method and investigated their effect on the bias and stability of parameter estimates and bias of the standard errors. We found that multiple imputation using a latent class model with many latent classes was the most promising method for handling incomplete categorical data, especially when the number of variables used in the imputation model is large.

AB - We studied four methods for handling incomplete categorical data in statistical modeling: (1) maximum likelihood estimation of the statistical model with incomplete data, (2) multiple imputation using a loglinear model, (3) multiple imputation using a latent class model, (4) and multivariate imputation by chained equations. Each method has advantages and disadvantages, and it is unknown which method should be recommended to practitioners. We reviewed the merits of each method and investigated their effect on the bias and stability of parameter estimates and bias of the standard errors. We found that multiple imputation using a latent class model with many latent classes was the most promising method for handling incomplete categorical data, especially when the number of variables used in the imputation model is large.

U2 - 10.1177/0962280212465502

DO - 10.1177/0962280212465502

M3 - Article

VL - 25

SP - 754

EP - 774

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 2

ER -