P Values – Yet Again This Deceptively Slippery Concept

The nature of the P value has recent come up in the New England Journal of Medicine. Pocock, a statistician, is quoted as saying that “a P value of 0.05 carries a 5% risk of a false positive result.”[1]

Such a statement is obviously wrong, and Daniel Hu complains, correctly, that it is a “misconception”.[2] So Pocock and Stone reply that the p value of 0.05 carries a 5% risk of carrying a false positive result “when there is no true difference between treatments.”[3] This is correct, provided it is understood that false positive does not mean that the probability that the treatment is not effective is 5%. When is it reasonable to suppose that there is absolutely no true difference between treatments? Hardly ever. So the P value is not very useful to decision makers. The CLAHRC WM Director cautions statisticians not to discount prior consideration of how likely/ realistic a null hypothesis is. Homeopathy aside, it is seldom a plausible prior hypothesis.

— Richard Lilford, CLAHRC WM Director


  1. Pocock SJ & Stone GW. The primary outcome is positive – is that good enough? N Engl J Med. 2016; 375: 971-9.
  2. Hu D. The Nature of the P Value. N Engl J Med. 2016; 375: 2205.
  3. Pocock SJ & Stone GW. Author’s Reply. N Engl J Med. 2016; 375: 2205-6.

2 thoughts on “P Values – Yet Again This Deceptively Slippery Concept”

  1. hmm, see http://andrewgelman.com/2016/03/07/29212/

    “Ultimately the problem is not with p-values but with null-hypothesis significance testing, that parody of falsificationism in which straw-man null hypothesis A is rejected and this is taken as evidence in favor of preferred alternative B (see Gelman, 2014). Whenever this sort of reasoning is being done, the problems discussed above will arise. Confidence intervals, credible intervals, Bayes factors, cross-validation: you name the method, it can and will be twisted, even if inadvertently, to create the appearance of strong evidence where none exists.”

  2. A decent article on this issue is Goodman’s 2003 paper “A dirty dozen: twelve p-value misconceptions”

    He provides this definition
    The definition of the P value is: The probability of the observed result, plus more extreme results, if the null hypothesis were true

    The conclusion reiases important points:

    The most important foundational issue to appreciate is that there is no number generated by standard methods that tells us the probability that a given conclusion is right or wrong. The determinants of the truth of a knowledge claim lie in combination of evidence both within and outside a given experiment, including the plausibility and evidential support of the proposed underlying mechanism. If that mechanism is unlikely, as with homeopathy or perhaps intercessory prayer, a low P value is not going to make a treatment based on that mechanism plausible. It is a very rare single experiment that establishes proof.

    The second principle is that the size of an effect matters, and that the entire confidence interval should be considered as an experiment’s result, more so than the P value or even the effect estimate. The confidence interval incorporates both the size and imprecision in effect estimated by the data

    He finishes talking about using Bayes, but fankly that’s far too radical. 🙂

    Colgreave and Ruxton’s 2003 article emphasises confidence intervals:
    In this context demolishing post hoc power calculations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s