A comparison of methods for creating multiple imputations of nominal variables

K.M. Lang, Wei Wu

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This study evaluated the capabilities of five MI methods that can be used to treat incomplete nominal variables: multiple imputation with chained equations (MICE) using polytomous regression as the elementary imputation method; MICE based on classification and regression trees (CART); MICE based on nested logistic regressions; the ranking procedure described by Allison (2002); and a joint modeling approach based on the general location model. We first motivate our inquiry with an applied example and then present the results of a Monte Carlo simulation study that compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories. We found that MICE with polytomous regression was the strongest performer while the Allison (2002) ranking procedure and MICE with CART performed poorly in most conditions.

Original languageEnglish
Pages (from-to)290-304
JournalMultivariate Behavioral Research
Volume52
Issue number3
DOIs
Publication statusPublished - 2017

Keywords

  • General location model
  • missing data
  • multiple imputation
  • multiple imputation with chained equations
  • nominal variables
  • MISSING-DATA
  • MULTIVARIATE IMPUTATION
  • SELF-DETERMINATION
  • REGRESSION
  • VALUES

Cite this

@article{42c11da962114c759d97dc3a62d6f984,
title = "A comparison of methods for creating multiple imputations of nominal variables",
abstract = "Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This study evaluated the capabilities of five MI methods that can be used to treat incomplete nominal variables: multiple imputation with chained equations (MICE) using polytomous regression as the elementary imputation method; MICE based on classification and regression trees (CART); MICE based on nested logistic regressions; the ranking procedure described by Allison (2002); and a joint modeling approach based on the general location model. We first motivate our inquiry with an applied example and then present the results of a Monte Carlo simulation study that compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories. We found that MICE with polytomous regression was the strongest performer while the Allison (2002) ranking procedure and MICE with CART performed poorly in most conditions.",
keywords = "General location model, missing data, multiple imputation, multiple imputation with chained equations, nominal variables, MISSING-DATA, MULTIVARIATE IMPUTATION, SELF-DETERMINATION, REGRESSION, VALUES",
author = "K.M. Lang and Wei Wu",
year = "2017",
doi = "10.1080/00273171.2017.1289360",
language = "English",
volume = "52",
pages = "290--304",
journal = "Multivariate Behavioral Research",
issn = "0027-3171",
publisher = "ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD",
number = "3",

}

A comparison of methods for creating multiple imputations of nominal variables. / Lang, K.M.; Wu, Wei.

In: Multivariate Behavioral Research, Vol. 52, No. 3, 2017, p. 290-304.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - A comparison of methods for creating multiple imputations of nominal variables

AU - Lang, K.M.

AU - Wu, Wei

PY - 2017

Y1 - 2017

N2 - Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This study evaluated the capabilities of five MI methods that can be used to treat incomplete nominal variables: multiple imputation with chained equations (MICE) using polytomous regression as the elementary imputation method; MICE based on classification and regression trees (CART); MICE based on nested logistic regressions; the ranking procedure described by Allison (2002); and a joint modeling approach based on the general location model. We first motivate our inquiry with an applied example and then present the results of a Monte Carlo simulation study that compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories. We found that MICE with polytomous regression was the strongest performer while the Allison (2002) ranking procedure and MICE with CART performed poorly in most conditions.

AB - Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This study evaluated the capabilities of five MI methods that can be used to treat incomplete nominal variables: multiple imputation with chained equations (MICE) using polytomous regression as the elementary imputation method; MICE based on classification and regression trees (CART); MICE based on nested logistic regressions; the ranking procedure described by Allison (2002); and a joint modeling approach based on the general location model. We first motivate our inquiry with an applied example and then present the results of a Monte Carlo simulation study that compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories. We found that MICE with polytomous regression was the strongest performer while the Allison (2002) ranking procedure and MICE with CART performed poorly in most conditions.

KW - General location model

KW - missing data

KW - multiple imputation

KW - multiple imputation with chained equations

KW - nominal variables

KW - MISSING-DATA

KW - MULTIVARIATE IMPUTATION

KW - SELF-DETERMINATION

KW - REGRESSION

KW - VALUES

U2 - 10.1080/00273171.2017.1289360

DO - 10.1080/00273171.2017.1289360

M3 - Article

VL - 52

SP - 290

EP - 304

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

SN - 0027-3171

IS - 3

ER -