Correcting selection bias in big data by pseudo-weighting

An-Chiao Liu*, Sander Scholtus, Ton de Waal

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)
51 Downloads (Pure)

Abstract

Nonprobability samples, for example observational studies, online opt-in surveys, or register data, do not come from a sampling design and therefore may suffer from selection bias. To correct for selection bias, Elliott and Valliant (EV) proposed a pseudo-weight estimation method that applies a two-sample setup for a probability sample and a nonprobability sample drawn from the same population, sharing some common auxiliary variables. By estimating the propensities of inclusion in the nonprobability sample given the two samples, we may correct the selection bias by (pseudo) design-based approaches. This paper expands the original method, allowing for large sampling fractions in either sample or for high expected overlap between selected units in each sample, conditions often present in administrative data sets and more frequently occurring with Big Data.
Original languageEnglish
Pages (from-to)1181-1203
JournalJournal of Survey Statistics and Methodology
Volume11
Issue number5
DOIs
Publication statusPublished - 2023

Keywords

  • Big Data
  • Nonprobability sample
  • Propensity score
  • Pseudo population bootstrap
  • Selection bias

Fingerprint

Dive into the research topics of 'Correcting selection bias in big data by pseudo-weighting'. Together they form a unique fingerprint.

Cite this