Crowdsourcing hypothesis tests: Making transparent how design choices shape research results

Crowdsourcing Hypothesis Tests Collaboration

    Research output: Contribution to journalArticleScientificpeer-review

    Abstract

    To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.

    Original languageEnglish
    JournalPsychological Bulletin
    DOIs
    Publication statusAccepted/In press - 2020

    Fingerprint

    Surveys and Questionnaires

    Cite this

    @article{f9d6055dd1c840f195ee0eb13223caec,
    title = "Crowdsourcing hypothesis tests: Making transparent how design choices shape research results",
    abstract = "To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.",
    author = "{Crowdsourcing Hypothesis Tests Collaboration} and Landy, {Justin F} and Jia, {Miaolei Liam} and Ding, {Isabel L} and Domenico Viganola and Warren Tierney and Anna Dreber and Magnus Johannesson and Thomas Pfeiffer and Ebersole, {Charles R} and Gronau, {Quentin F} and Alexander Ly and {van den Bergh}, Don and Maarten Marsman and Koen Derks and Eric-Jan Wagenmakers and Andrew Proctor and Bartels, {Daniel M} and Bauman, {Christopher W} and Brady, {William J} and Felix Cheung and Andrei Cimpian and Simone Dohle and Donnellan, {M Brent} and Adam Hahn and Hall, {Michael P} and William Jim{\'e}nez-Leal and Johnson, {David J} and Lucas, {Richard E} and Beno{\^i}t Monin and Andres Montealegre and Elizabeth Mullen and Jun Pang and Jennifer Ray and Reinero, {Diego A} and Jesse Reynolds and Walter Sowden and Daniel Storage and Runkun Su and Tworek, {Christina M} and {Van Bavel}, {Jay J} and Daniel Walco and Julian Wills and Xiaobing Xu and Yam, {Kai Chi} and Xiaoyu Yang and Cunningham, {William A} and Martin Schweinsberg and Molly Urwitz and Uhlmann, {Eric L}",
    year = "2020",
    doi = "10.1037/bul0000220",
    language = "English",
    journal = "Psychological Bulletin",
    issn = "0033-2909",
    publisher = "American Psychological Association",

    }

    Crowdsourcing hypothesis tests : Making transparent how design choices shape research results. / Crowdsourcing Hypothesis Tests Collaboration.

    In: Psychological Bulletin, 2020.

    Research output: Contribution to journalArticleScientificpeer-review

    TY - JOUR

    T1 - Crowdsourcing hypothesis tests

    T2 - Making transparent how design choices shape research results

    AU - Crowdsourcing Hypothesis Tests Collaboration

    AU - Landy, Justin F

    AU - Jia, Miaolei Liam

    AU - Ding, Isabel L

    AU - Viganola, Domenico

    AU - Tierney, Warren

    AU - Dreber, Anna

    AU - Johannesson, Magnus

    AU - Pfeiffer, Thomas

    AU - Ebersole, Charles R

    AU - Gronau, Quentin F

    AU - Ly, Alexander

    AU - van den Bergh, Don

    AU - Marsman, Maarten

    AU - Derks, Koen

    AU - Wagenmakers, Eric-Jan

    AU - Proctor, Andrew

    AU - Bartels, Daniel M

    AU - Bauman, Christopher W

    AU - Brady, William J

    AU - Cheung, Felix

    AU - Cimpian, Andrei

    AU - Dohle, Simone

    AU - Donnellan, M Brent

    AU - Hahn, Adam

    AU - Hall, Michael P

    AU - Jiménez-Leal, William

    AU - Johnson, David J

    AU - Lucas, Richard E

    AU - Monin, Benoît

    AU - Montealegre, Andres

    AU - Mullen, Elizabeth

    AU - Pang, Jun

    AU - Ray, Jennifer

    AU - Reinero, Diego A

    AU - Reynolds, Jesse

    AU - Sowden, Walter

    AU - Storage, Daniel

    AU - Su, Runkun

    AU - Tworek, Christina M

    AU - Van Bavel, Jay J

    AU - Walco, Daniel

    AU - Wills, Julian

    AU - Xu, Xiaobing

    AU - Yam, Kai Chi

    AU - Yang, Xiaoyu

    AU - Cunningham, William A

    AU - Schweinsberg, Martin

    AU - Urwitz, Molly

    AU - Uhlmann, Eric L

    PY - 2020

    Y1 - 2020

    N2 - To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.

    AB - To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.

    U2 - 10.1037/bul0000220

    DO - 10.1037/bul0000220

    M3 - Article

    C2 - 31944796

    JO - Psychological Bulletin

    JF - Psychological Bulletin

    SN - 0033-2909

    ER -