TY - JOUR
T1 - Quantifying selection bias in flash estimates derived from cumulative non-probability samples
AU - Gomez-Echeverry, Santiago
AU - Van Delden, Arnout
AU - De Waal, Ton
AU - Pavlopoulos, Dimitris
AU - Stoel, Reinoud
PY - 2025/9
Y1 - 2025/9
N2 - The increasing reliance on non-probability samples, including administrative data, big data, and surveys with selective non-response, has brought the issue of selection bias to the forefront of applied research and official statistics production. This study investigates selection bias in the context of flash estimates, focusing on their application in short-term economic indicators such as the Gross Domestic Product and the Consumer Price Index. Leveraging prominent theoretical frameworks for modeling selection bias, we develop and evaluate a series of estimators that incorporate information from a fully-observed lagged target variable, current auxiliary variables, or combined data sources. Through simulations and a case study using turnover (i.e., revenue) data from Statistics Netherlands, we assess these estimators across diverse scenarios, including variations in variable distribution, selectivity levels, and correlations between auxiliary and target variables. Results indicate that estimators combining lagged and current auxiliary information provide more consistent results than those relying on a single data source. Additionally, estimators based on combined data sources perform relatively well under high selectivity and non-normal target distributions. These findings provide practical and easily implementable tools to address selection bias in non-probability samples, enhancing the reliability and timeliness of official statistics.
AB - The increasing reliance on non-probability samples, including administrative data, big data, and surveys with selective non-response, has brought the issue of selection bias to the forefront of applied research and official statistics production. This study investigates selection bias in the context of flash estimates, focusing on their application in short-term economic indicators such as the Gross Domestic Product and the Consumer Price Index. Leveraging prominent theoretical frameworks for modeling selection bias, we develop and evaluate a series of estimators that incorporate information from a fully-observed lagged target variable, current auxiliary variables, or combined data sources. Through simulations and a case study using turnover (i.e., revenue) data from Statistics Netherlands, we assess these estimators across diverse scenarios, including variations in variable distribution, selectivity levels, and correlations between auxiliary and target variables. Results indicate that estimators combining lagged and current auxiliary information provide more consistent results than those relying on a single data source. Additionally, estimators based on combined data sources perform relatively well under high selectivity and non-normal target distributions. These findings provide practical and easily implementable tools to address selection bias in non-probability samples, enhancing the reliability and timeliness of official statistics.
KW - Flash estimates
KW - Non-probabilistic sampling
KW - Official statistics
KW - Selection error
KW - Survey non-response
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=wosstart_imp_pure20230417&SrcAuth=WosAPI&KeyUT=WOS:001563466500001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1093/jssam/smaf016
DO - 10.1093/jssam/smaf016
M3 - Article
SN - 2325-0984
JO - Journal of Survey Statistics and Methodology
JF - Journal of Survey Statistics and Methodology
M1 - smaf016
ER -