A (Partial) Solution to the Replicability Crisis: Replacing P-Values

March 9, 2016

Press contact:Jenesse Miller(213) 810-8554

There is an ongoing crisis over the replicability of scientific findings (here is one recent example and nice discussion about it).

Many factors contribute to this situation. An important part of the problem is the conventional approach of treating the p-value as a measure of the strength of evidence against the null hypothesis, and treating the null hypothesis as false whenever the p-value is less than 0.05 (i.e., the finding is “statistically significant”). This method of hypothesis testing, when applied to standard research designs in many sciences, is too likely to reject the null hypothesis and therefore generate an unintentionally high rate of false positives.

In response to the recent crisis over the reproducibility of scientific findings, the American Statistical Association took the remarkable step of issuing an official “Statement on p-values and statistical significance (starts on p.8),” published on March 7, 2016. This statement summarizes the consensus reached by an impressive group of leading statisticians and scientists representing a range of disciplines. The statement endorses six principles:

P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

This statement has deservedly been receiving a lot of attention.

Panel members were invited to write comments on the statement, which were published alongside the statement.

Jim Berger (a prominent statistician who served on the panel) and I wrote one of these comments. In it, we summarized our recent proposal to replace p-values with an alternative statistic that is just as easy to calculate and more accurately conveys the strength of evidence against the null hypothesis. With permission from the American Statistical Association, our comment is reproduced here:

*List of statisticians and scientists: Naomi Altman, Jim Berger, Yoav Benjamini, Don Berry, Brad Carlin, John Carlin, Marie Davidian, Steve Fienberg, Andrew Gelman, Steve Goodman, Sander Greenland, Guido Imbens, John Ioannidis, Valen Johnson, Michael Lavine, Michael Lew, Rod Little, Deborah Mayo, Chuck McCulloch, Michele Millar, Sally Morton, Regina Nuzzo, Hilary Parker, Kenneth Rothman, Don Rubin, Stephen Senn, Uri Simonsohn, Dalene Stangl, Philip Stark, Steve Ziliak