The prevalence of statistical reporting errors in psychology (1985-2013)

M.B. Nuijten, C.H.J. Hartgerink, M.A.L.M. van Assen, S. Epskamp, J.M. Wicherts

Research output: Contribution to journalArticleScientificpeer-review

Abstract

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.
Original languageEnglish
Pages (from-to)1205-1226
JournalBehavior Research Methods
Volume48
Issue number4
Early online date23 Oct 2015
DOIs
Publication statusPublished - 2016

Fingerprint

Information Dissemination
Psychology
Inconsistency
Null Hypothesis
Significance Testing
Manuscripts
Statistics

Cite this

@article{cea401a829ee43428731a4a49fba3054,
title = "The prevalence of statistical reporting errors in psychology (1985-2013)",
abstract = "This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.",
author = "M.B. Nuijten and C.H.J. Hartgerink and {van Assen}, M.A.L.M. and S. Epskamp and J.M. Wicherts",
year = "2016",
doi = "10.3758/s13428-015-0664-2",
language = "English",
volume = "48",
pages = "1205--1226",
journal = "Behavior Research Methods",
issn = "1554-351X",
publisher = "Springer",
number = "4",

}

The prevalence of statistical reporting errors in psychology (1985-2013). / Nuijten, M.B.; Hartgerink, C.H.J.; van Assen, M.A.L.M. ; Epskamp, S.; Wicherts, J.M.

In: Behavior Research Methods, Vol. 48, No. 4, 2016, p. 1205-1226.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - The prevalence of statistical reporting errors in psychology (1985-2013)

AU - Nuijten, M.B.

AU - Hartgerink, C.H.J.

AU - van Assen, M.A.L.M.

AU - Epskamp, S.

AU - Wicherts, J.M.

PY - 2016

Y1 - 2016

N2 - This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

AB - This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package “statcheck.” statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called “co-pilot model,” and to use statcheck to flag possible inconsistencies in one’s own manuscript or during the review process.

UR - https://osf.io/e9qbp/

U2 - 10.3758/s13428-015-0664-2

DO - 10.3758/s13428-015-0664-2

M3 - Article

C2 - 26497820

VL - 48

SP - 1205

EP - 1226

JO - Behavior Research Methods

JF - Behavior Research Methods

SN - 1554-351X

IS - 4

ER -