Estimating classification error under edit restrictions in combined survey-register data using Multiple Imputation Latent Class modelling (MILC)

L. Boeschoten, D.L. Oberski, A.G. de Waal

Research output: Contribution to journalArticleScientificpeer-review

15 Downloads (Pure)

Abstract

Both registers and surveys can contain classication errors. These errors can be estimated by making use of information that is obtained when making use of a combined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources,
and simultaneously takes impossible combinations with other variables into account. Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.
Original languageEnglish
Pages (from-to)921–962
JournalJournal of Official Statistics
Volume33
Issue number4
DOIs
Publication statusPublished - 2017

Fingerprint

Latent Class
Multiple Imputation
Restriction
Modeling
Statistics
Latent Class Model
Multiplication
Entropy
Planning
Simulation Study
Estimate

Cite this

@article{89834a44e4544aea91509426f018a987,
title = "Estimating classification error under edit restrictions in combined survey-register data using Multiple Imputation Latent Class modelling (MILC)",
abstract = "Both registers and surveys can contain classication errors. These errors can be estimated by making use of information that is obtained when making use of a combined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources,and simultaneously takes impossible combinations with other variables into account. Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.",
author = "L. Boeschoten and D.L. Oberski and {de Waal}, A.G.",
year = "2017",
doi = "10.1515/jos-2017-0044",
language = "English",
volume = "33",
pages = "921–962",
journal = "Journal of Official Statistics",
issn = "0282-423X",
publisher = "De Gruyter Open Ltd.",
number = "4",

}

Estimating classification error under edit restrictions in combined survey-register data using Multiple Imputation Latent Class modelling (MILC). / Boeschoten, L.; Oberski, D.L.; de Waal, A.G.

In: Journal of Official Statistics, Vol. 33, No. 4, 2017, p. 921–962.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Estimating classification error under edit restrictions in combined survey-register data using Multiple Imputation Latent Class modelling (MILC)

AU - Boeschoten, L.

AU - Oberski, D.L.

AU - de Waal, A.G.

PY - 2017

Y1 - 2017

N2 - Both registers and surveys can contain classication errors. These errors can be estimated by making use of information that is obtained when making use of a combined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources,and simultaneously takes impossible combinations with other variables into account. Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.

AB - Both registers and surveys can contain classication errors. These errors can be estimated by making use of information that is obtained when making use of a combined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources,and simultaneously takes impossible combinations with other variables into account. Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.

U2 - 10.1515/jos-2017-0044

DO - 10.1515/jos-2017-0044

M3 - Article

VL - 33

SP - 921

EP - 962

JO - Journal of Official Statistics

JF - Journal of Official Statistics

SN - 0282-423X

IS - 4

ER -