TY - JOUR

T1 - StatBreak

T2 - Identifying “lucky” data Points through genetic algorithms

AU - Rosenbusch, Hannes

AU - Hilbert, Leon

AU - Evans, Anthony

AU - Zeelenberg, Marcel

PY - 2020

Y1 - 2020

N2 - Sometimes interesting statistical findings are produced by a small number of “lucky” data points within the tested sample. To address this issue, researchers and reviewers are encouraged to investigate outliers and influential data points. Here, we present StatBreak, an easy-to-apply method based on a genetic algorithm, which identifies the observations that most strongly contributed to a hypothesized finding (e.g., effect size, model fit, p-value, Bayes factor). Within a given sample, StatBreak searches for the largest subsample in which a previously observed pattern is not present or reduced below a specifiable threshold. Thus, it answers the question: “Which (and how few) ‘lucky’ cases would need to be excluded from a given sample for data-based conclusion to change?” StatBreak consists of a simple R-function and flags the luckiest data points for any form of statistical analysis. Here, we demonstrate the effectiveness of the method with simulated and real data across a range of study designs and analyses. Additionally, we describe StatBreak’s R-function and explain how researchers and reviewers can apply the method to the data they are working with.

AB - Sometimes interesting statistical findings are produced by a small number of “lucky” data points within the tested sample. To address this issue, researchers and reviewers are encouraged to investigate outliers and influential data points. Here, we present StatBreak, an easy-to-apply method based on a genetic algorithm, which identifies the observations that most strongly contributed to a hypothesized finding (e.g., effect size, model fit, p-value, Bayes factor). Within a given sample, StatBreak searches for the largest subsample in which a previously observed pattern is not present or reduced below a specifiable threshold. Thus, it answers the question: “Which (and how few) ‘lucky’ cases would need to be excluded from a given sample for data-based conclusion to change?” StatBreak consists of a simple R-function and flags the luckiest data points for any form of statistical analysis. Here, we demonstrate the effectiveness of the method with simulated and real data across a range of study designs and analyses. Additionally, we describe StatBreak’s R-function and explain how researchers and reviewers can apply the method to the data they are working with.

UR - https://osf.io/fmnxp/

M3 - Article

JO - Advances in Methods and Practices in Psychological Science

JF - Advances in Methods and Practices in Psychological Science

SN - 2515-2459

ER -