Look Out for ‘p-hacking’

Psychologists do it, biologists do it, and even squeaky clean economists do it. How can you find out whether a group of scientists are guilty of making lots of comparisons and then selecting positive results for publication? You can draw a graph using all the p values in all the papers, or all the p values relating to the central question in all the papers. You might find it looks like this:

DC - Bullshit Detectors Fig 1

That would be a reassuring result. But what about if it looked like this?

DC - Bullshit Detectors Fig 2

Such a result shows clustering of p values just below the conventional significance level of 0.05 and is strongly indicative of p-hacking.

In an important blog,[1] prominent statistician Uri Simonsohn shows that the first – satisfactory – pattern approximates to findings when all p values in all papers are analysed. But when the sample of p values in ‘enriched’ by selecting principal findings or those p values associated with a co-variate, then the signal emerges from the noise, as in the second figure.

