Quantifying selection bias in flash estimates derived from cumulative non-probability samples

  • Santiago Gomez-Echeverry*
  • , Arnout Van Delden
  • , Ton De Waal
  • , Dimitris Pavlopoulos
  • , Reinoud Stoel
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The increasing reliance on non-probability samples, including administrative data, big data, and surveys with selective non-response, has brought the issue of selection bias to the forefront of applied research and official statistics production. This study investigates selection bias in the context of flash estimates, focusing on their application in short-term economic indicators such as the Gross Domestic Product and the Consumer Price Index. Leveraging prominent theoretical frameworks for modeling selection bias, we develop and evaluate a series of estimators that incorporate information from a fully-observed lagged target variable, current auxiliary variables, or combined data sources. Through simulations and a case study using turnover (i.e., revenue) data from Statistics Netherlands, we assess these estimators across diverse scenarios, including variations in variable distribution, selectivity levels, and correlations between auxiliary and target variables. Results indicate that estimators combining lagged and current auxiliary information provide more consistent results than those relying on a single data source. Additionally, estimators based on combined data sources perform relatively well under high selectivity and non-normal target distributions. These findings provide practical and easily implementable tools to address selection bias in non-probability samples, enhancing the reliability and timeliness of official statistics.
Original languageEnglish
Article numbersmaf016
Number of pages27
JournalJournal of Survey Statistics and Methodology
DOIs
Publication statusE-pub ahead of print - Sept 2025

Keywords

  • Flash estimates
  • Non-probabilistic sampling
  • Official statistics
  • Selection error
  • Survey non-response

Fingerprint

Dive into the research topics of 'Quantifying selection bias in flash estimates derived from cumulative non-probability samples'. Together they form a unique fingerprint.

Cite this