Estimating classification error under edit restrictions in combined survey-register data

Research output: Working paperDiscussion paperOther research output

Abstract

Both registers and surveys can contain classification errors. These errors can be
estimated by making use of information that is obtained when making use of a
combined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources, and
simultaneously takes impossible combinations with other variables into account.
Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy R2 of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.
Original languageEnglish
PublisherStatistics Netherlands
Number of pages24
Publication statusPublished - 2016

Publication series

NameCBS Discussion Paper
PublisherStatistics Netherlands

Fingerprint

entropy
method
modeling
simulation
statistics
analysis
planning

Keywords

  • latent class models
  • multiple imputation
  • measurement errors
  • multisource statistics

Cite this

@techreport{375a6db74038400087d682e72ccd3bd8,
title = "Estimating classification error under edit restrictions in combined survey-register data",
abstract = "Both registers and surveys can contain classification errors. These errors can beestimated by making use of information that is obtained when making use of acombined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources, andsimultaneously takes impossible combinations with other variables into account.Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy R2 of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.",
keywords = "latent class models, multiple imputation, measurement errors, multisource statistics",
author = "L. Boeschoten and D.L. Oberski and {de Waal}, A.G.",
year = "2016",
language = "English",
series = "CBS Discussion Paper",
publisher = "Statistics Netherlands",
type = "WorkingPaper",
institution = "Statistics Netherlands",

}

Estimating classification error under edit restrictions in combined survey-register data. / Boeschoten, L.; Oberski, D.L.; de Waal, A.G.

Statistics Netherlands, 2016. (CBS Discussion Paper).

Research output: Working paperDiscussion paperOther research output

TY - UNPB

T1 - Estimating classification error under edit restrictions in combined survey-register data

AU - Boeschoten, L.

AU - Oberski, D.L.

AU - de Waal, A.G.

PY - 2016

Y1 - 2016

N2 - Both registers and surveys can contain classification errors. These errors can beestimated by making use of information that is obtained when making use of acombined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources, andsimultaneously takes impossible combinations with other variables into account.Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy R2 of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.

AB - Both registers and surveys can contain classification errors. These errors can beestimated by making use of information that is obtained when making use of acombined dataset. We propose a new method based on latent class modelling that estimates the number of classification errors in the multiple sources, andsimultaneously takes impossible combinations with other variables into account.Furthermore, we use the latent class model to multiply impute a new variable, which enhances the quality of statistics based on the combined dataset. The performance of this method is investigated by a simulation study, which shows that whether the method can be applied depends on the entropy R2 of the LC model and the type of analysis a researcher is planning to do. Furthermore, the method is applied to a combined dataset from Statistics Netherlands.

KW - latent class models

KW - multiple imputation

KW - measurement errors

KW - multisource statistics

M3 - Discussion paper

T3 - CBS Discussion Paper

BT - Estimating classification error under edit restrictions in combined survey-register data

PB - Statistics Netherlands

ER -