We have argued in this News Blog that the time has come to stop using frequentist confidence intervals as (usually implicit) decision rules, and to bring all salient knowledge into parameter estimation through the prior. Only in that way can Bradford Hill’s criteria [1] be brought explicitly into quantitative estimates. Only by Bayesian methods can the probabilities needed for decision-making – that is the probability of hypotheses given data – be calculated.[2] [3] [4]

But what then? How should decision-makers use the probability distributions that would pop out of the computer once ‘priors’ have been updated? Should they just take the mid-point of the distribution as the quantity of relevance? Well it depends on who the decision-maker is. If the decision-maker is a patient considering mutilating surgery or a parent choosing a method of pedagogy, then the mid-point of the probability distribution may be the appropriate probability for decision making purposes, on the ‘best guess’ principle, irrespective of how wide the credible interval may be. Some individual decision-makers may have a form of risk aversion in which they view a unit loss as worse than failure to realise an equivalent unit gain. But they can make their own adjustment for that.

For a decision-maker acting on behalf of a health system, matters are more complex. Such a decision-maker must consider the effect of their behaviour across decisions in general, not just the current decision in particular. A premium should then be placed on probabilities with narrower credible intervals over those with broader intervals *ceteris paribus*. This is because the system as a whole will be weakened if decisions are made while the evidence-base is immature in the sense that probability distributions are wide – at the limit nearly 50% of decisions would be wrong, so the system could never learn. So a corporate decision-maker should exhibit an aversion for imprecision and place a premium on increasing precision. Of course, this begs the question of how much premium – a function is required for improvements in precision? The form of this function is a question that requires further study. In the meantime, rationing decisions should demonstrate an aversion for wide credible intervals. In such a scenario, there really is a case for “more research required”.

*— Richard Lilford, CLAHRC WM Director
— Sam Watson, Research Fellow*

- Hill AB. The environment and disease: Association or causation?
*Proc R Soc Med*. 1965;**58**(5): 295-300. - Lindley DV. The Philosophy of Statistics.
*J Roy Stat Soc D-Sta*. 2000;**49**(3):293-337. - Goodman SN. Toward Evidence-Based Medical Statistics. 2: The Bayes Factor.
*Ann Intern Med*. 1999;**130**: 1005-13. - Lilford RJ, & Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue.
*BMJ*. 1996;**313**(7057): 603-7.

I also think that there is value to be gained from Bayesian methods and have done for some time. I’ve been looking, without success, for a research opportunity in this area.

Richard, Sam

Your comment that “the system would never learn” is right in one sense (once a decision is made, the likelihood of further research to narrow the interval falls) but probably wrong in another (surely systems learn from mistakes as much as humans?). One reason for the lack of further research is the high (financial and psychological) cost of disinvestment, and this increases the potential negative consequences of actual mistakes. Mistakes at service policy level can also have very high health costs and clearly should be avoided wherever possible, but reasons for such mistakes (e.g. why was actual effectiveness at the lower bound of the wide CI/CL) should be sought and used when similar decisions are being made in the future.

Really interesting thoughts and discussion – thanks.

But….with respect to policy makers, if the evidence-base is already low, then the best estimate from well-conducted research (lying at the centre of the confidence interval) may still be the best option on which to base a current decision. As Bland and Altman point out, your actual effect estimate, if not systematically biased, is still the most likely, regardless of the width of the confidence interval.

And…..with respect to individuals, is the middle much help? Our problem is that the middle is all about the group (which is why it helps group-level policy makers), and not about the individual. Even if the NNT is 4, does that not mean the individual is still more likely to get no benefit from the intervention than they are to benefit?