Abstract
Non-response is a major problem for anyone collecting and processing data. A commonly used technique to deal with missing data is imputation, where missing values are estimated and filled in into the dataset. Imputation can become challenging if the variable to be imputed has to comply with a known total. Even more challenging is the case where several variables in the same dataset need to be imputed and, in addition to known totals, logical restrictions between variables have to be satisfied. In our paper, we develop an approach for a broad class of imputation methods for multivariate categorical data such that previously published totals are preserved while logical restrictions on the data are satisfied. The developed approach can be used in combination with any imputation model that estimates imputation probabilities, i.e. the probability that imputation of a certain category for a variable in a certain unit leads to the correct value for this variable and unit.
Original language | English |
---|---|
Number of pages | 32 |
Journal | Asta-advances in Statistical Analysis |
Early online date | 2023 |
DOIs | |
Publication status | E-pub ahead of print - 2023 |
Keywords
- Edit rules
- Fully conditional specification
- Mass imputation
- Non-response