Calibrated imputation for multivariate categorical data

T. de Waal*, J. Daalmans

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

18 Downloads (Pure)

Abstract

Non-response is a major problem for anyone collecting and processing data. A commonly used technique to deal with missing data is imputation, where missing values are estimated and filled in into the dataset. Imputation can become challenging if the variable to be imputed has to comply with a known total. Even more challenging is the case where several variables in the same dataset need to be imputed and, in addition to known totals, logical restrictions between variables have to be satisfied. In our paper, we develop an approach for a broad class of imputation methods for multivariate categorical data such that previously published totals are preserved while logical restrictions on the data are satisfied. The developed approach can be used in combination with any imputation model that estimates imputation probabilities, i.e. the probability that imputation of a certain category for a variable in a certain unit leads to the correct value for this variable and unit.
Original languageEnglish
Number of pages32
JournalAsta-advances in Statistical Analysis
Early online date2023
DOIs
Publication statusE-pub ahead of print - 2023

Keywords

  • Edit rules
  • Fully conditional specification
  • Mass imputation
  • Non-response

Fingerprint

Dive into the research topics of 'Calibrated imputation for multivariate categorical data'. Together they form a unique fingerprint.

Cite this