Multiple imputation in data that grow over time

A comparison of three strategies

X. M. Kavelaars, S. van Buuren, J. R. van Ginkel

Research output: Working paperOther research output

2 Downloads (Pure)

Abstract

Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of multiple waves as a new missing data problem and overwrite existing imputations with new values (re-imputation). Alternatively, we may keep the existing imputations, and impute only the new data. We may do either a full multiple imputation (nested) or a single imputation (appended) on the new data per imputed set. This study compares these three strategies by means of simulation. All techniques resulted in valid inference under a monotone missingness pattern. A non-monotone missingness pattern led to biased and non-confidence valid regression coefficients after nested and appended imputation, depending on the correlation structure of the data. Correlations within timepoints must be stronger than correlations between timepoints to obtain valid inference. In an empirical example, the three strategies performed similarly.We conclude that appended imputation is especially beneficial in longitudinal datasets that suffer from dropout.
Original languageEnglish
PublisherarXiv.org
Publication statusPublished - 2019

Fingerprint

comparison
simulation

Keywords

  • stat.ME

Cite this

@techreport{b3df0d352c844af7a903b0942db88425,
title = "Multiple imputation in data that grow over time: A comparison of three strategies",
abstract = "Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of multiple waves as a new missing data problem and overwrite existing imputations with new values (re-imputation). Alternatively, we may keep the existing imputations, and impute only the new data. We may do either a full multiple imputation (nested) or a single imputation (appended) on the new data per imputed set. This study compares these three strategies by means of simulation. All techniques resulted in valid inference under a monotone missingness pattern. A non-monotone missingness pattern led to biased and non-confidence valid regression coefficients after nested and appended imputation, depending on the correlation structure of the data. Correlations within timepoints must be stronger than correlations between timepoints to obtain valid inference. In an empirical example, the three strategies performed similarly.We conclude that appended imputation is especially beneficial in longitudinal datasets that suffer from dropout.",
keywords = "stat.ME",
author = "Kavelaars, {X. M.} and Buuren, {S. van} and Ginkel, {J. R. van}",
note = "15 pages, 5 tables, 1 figure",
year = "2019",
language = "English",
publisher = "arXiv.org",
type = "WorkingPaper",
institution = "arXiv.org",

}

Multiple imputation in data that grow over time : A comparison of three strategies. / Kavelaars, X. M.; Buuren, S. van; Ginkel, J. R. van.

arXiv.org, 2019.

Research output: Working paperOther research output

TY - UNPB

T1 - Multiple imputation in data that grow over time

T2 - A comparison of three strategies

AU - Kavelaars, X. M.

AU - Buuren, S. van

AU - Ginkel, J. R. van

N1 - 15 pages, 5 tables, 1 figure

PY - 2019

Y1 - 2019

N2 - Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of multiple waves as a new missing data problem and overwrite existing imputations with new values (re-imputation). Alternatively, we may keep the existing imputations, and impute only the new data. We may do either a full multiple imputation (nested) or a single imputation (appended) on the new data per imputed set. This study compares these three strategies by means of simulation. All techniques resulted in valid inference under a monotone missingness pattern. A non-monotone missingness pattern led to biased and non-confidence valid regression coefficients after nested and appended imputation, depending on the correlation structure of the data. Correlations within timepoints must be stronger than correlations between timepoints to obtain valid inference. In an empirical example, the three strategies performed similarly.We conclude that appended imputation is especially beneficial in longitudinal datasets that suffer from dropout.

AB - Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of multiple waves as a new missing data problem and overwrite existing imputations with new values (re-imputation). Alternatively, we may keep the existing imputations, and impute only the new data. We may do either a full multiple imputation (nested) or a single imputation (appended) on the new data per imputed set. This study compares these three strategies by means of simulation. All techniques resulted in valid inference under a monotone missingness pattern. A non-monotone missingness pattern led to biased and non-confidence valid regression coefficients after nested and appended imputation, depending on the correlation structure of the data. Correlations within timepoints must be stronger than correlations between timepoints to obtain valid inference. In an empirical example, the three strategies performed similarly.We conclude that appended imputation is especially beneficial in longitudinal datasets that suffer from dropout.

KW - stat.ME

M3 - Working paper

BT - Multiple imputation in data that grow over time

PB - arXiv.org

ER -