Service Delivery Research: Researcher-Led or Manager-Led?

The implication behind much Service Delivery Research is that it is researcher-led. After all, it is called “research”. But is this the correct way to conceptualise such research when its purpose is to evaluate an intervention?

For a start, the researcher might not have been around when the intervention was promulgated; many, perhaps most, service interventions are evaluated retrospectively. In the case of such ex-post evaluations the researcher has no part in the intervention and cannot be held responsible for it in any way – the responsibilities of the researchers relate solely to research, such as data, security and analysis. The researcher cannot accept responsibility for the intervention itself. For instance, it would be absurd to hold Nagin and Pepper [1] responsible for the death penalty by virtue of their role in evaluating its effect on homicide rates! Responsibility for selection, design, and implementation of interventions must lie elsewhere.

But even when the study is prospective, for instance, involving a cluster RCT, it does not follow that the researcher is responsible for the intervention. Take, for instance, the Mexican Universal Health Insurance trial.[2] The Mexican Government promulgated the intervention and Professor King and his colleagues had to scramble after the fact, to ensure that it was introduced over an evaluation framework. CLAHRCs work closely with health service and local authority managers, helping to supply their information needs and evaluate service delivery interventions to improve the quality / efficiency / accountability / acceptability of health care. The interventions are ‘owned’ by the health service, in the main.

This makes something of a nonsense of the Ottawa Statement on the ethics of cluster trials – for instance, it says that the researcher must ensure that the study intervention is “adequately justified” and “researchers should protect cluster interests.”[3]

Such statement seems to misplace the responsibility for the intervention. That responsibility must lie with the person who has the statutory duty of care and who is employed by the legal entity charged with protecting client interests. The Chief Executive or her delegate – the ‘Cluster Guardian’ – must bear this responsibility.[4] Of course, that does not let researchers off the hook. For a start, the researcher has responsibility for the research itself: design, data collation, etc. Also, researchers may advise or even recommend an intervention, in which case they have a vicarious responsibility.

Advice or suggestions offered by researchers must be sound – the researcher should not advocate a course of action that is clearly not in the cluster interest and should not deliberately misrepresent information or mislead / wrongly tempt the cluster guardian. But the cluster guardian is the primary moral agent with responsibility to serve the cluster interest. The ethics of doing so are the ethics of policy-making and service interventions generally. Policy-makers are often not very good at making policy, as pointed out by King and Crewe in their book “The Blunders of Our Governments”.[5] But that is a separate topic.

— Richard Lilford, CLAHRC WM Director


  1. Nagin DS & Pepper JV. Deterrence and the Death Penalty. Washington, D.C.: The National Academies Press, 2012.
  2. King G, Gakidou E, Imai K, et al. Public policy for the poor? A randomised assessment of the Mexican universal health insurance programme. Lancet. 2009; 373(9673):1447-54.
  3. Weijer C, Grimshaw JM, Eccles MP, et al. The Ottawa Statement on the Ethical Design and Conduct of Cluster Randomized Trials. PLoS Med. 2012; 9(11): e1001346.
  4. Edwards SJL, Lilford RJ, Hewison J. The ethics of randomised controlled trials from the perspectives of patients, the public, and healthcare professionals. BMJ. 1998; 317(7167): 1209-12.
  5. King A & Crewe I. The Blunders of Our Governments. London: Oneworld Publications, 2013.

Researchers Continue to Consistently Misinterpret p-values

For as long as there have been p-values there have been people misunderstanding p-values. Their nuanced definition eludes many researchers, statisticians included, and so they end up being misused and misinterpreted. The situation recently prompted the American Statistical Association (ASA)  to produce a statement on p-values.[1] Yet, they are still widely viewed as the most important bit of information in an empirical study, and careers are still built on ‘statistically significant’ findings. A paper in Management Science,[2] recently reported on Andrew Gelman’s blog,[3] reports the results of a number of surveys of top academics about their interpretations of the results of hypothetical studies. They show that these researchers, who include authors in the New England Journal of Medicine and American Economic Review, generally only consider whether the p-value is above or below 0.05; they consider p-values even when they are not relevant; they ignore the actual magnitude of an effect; and they use p-values to make inferences about the effect of an intervention on future subjects. Interestingly, the statistically untrained were less likely to make the same errors of judgement.

As the ASA statement and many, many other reports emphasise, p-values do not indicate the ‘truth’ of a result, nor do they imply clinical or economic significance, they are often presented for tests that are completely pointless, and they cannot be interpreted in isolation of all the other information about the statistical model and possible data analyses. It is possible that in the future the p-value will be relegated to a subsidiary statistic where it belongs rather than the main result, but until that time statistical education clearly needs to improve.

— Sam Watson, Research Fellow


  1. Wasserstein RL & Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016; 70(2). [ePub].
  2. McShane BM & Gal D. Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Manage Sci., 2015; 62(6): 1707-18.
  3. Gelman A. More evidence that even top researchers routinely misinterpret p-values. Statistical Modeling, Causal Inference, and Social Science. 26 July 2016.

Update on Zika for News Blog Readers

A recent review of epidemiological evidence from the Center for Disease Control (CDC) in Atlanta confirms the association of Zika arbovirus infections during pregnancy with microcephaly in the infant, with a risk of about one in 100.[1] It is probable that the risk of neurological effects less serious than microcephaly is also increased. A recent BMJ paper [2] analyses a cohort of microcephalic children born of mothers with Zika virus infection in pregnancy. They did not just measure the size of the head relative to length and weight. All babies underwent CT scan, MRI, or both. They all manifested strikingly similar features on neuro-imaging, and these features are largely distinct from the other known causes of microcephaly, including those associated with infections with other viruses, such as cytomegalovirus. The famous philosopher of science William Whewell, argued that if information of different types all corroborate the same theory, then that is powerful support in its favour.[3] The CLAHRC WM Director thinks a causal role for the virus is pretty much settled – we may assume that the Zika virus is indeed a cause of severe (and perhaps less severe) neurological damage in the foetus.

— Richard Lilford, CLAHRC WM Director


  1. Rasmussen SA, Jamieson DJ, Honein MA, Petersen LR. Zika Virus and Birth Defects — Reviewing the Evidence for Causality. N Engl J Med. 2016; 374: 1981-7.
  2. Aragao MFV, van der Linden V, Brainer-Lima AM, et al. Clinical features and neuroimaging (CT and MRI) findings in presumed Zika virus related congenital infection and microcephaly: retrospective case series study. BMJ. 2016; 353: i1901.
  3. Whewell W & Butts RE. William Whewell’s Theory of Scientific Method. Pittsburgh: University of Pittsburgh Press. 1968.

Inequalities: Your Next Exciting Instalment

One month ago we cited the majestical study of health and wealth published in JAMA.[1] A fortnight ago we cited Angus Deaton’s insightful commentary on this study.[2] This week we draw your attention to a study of wealth and health inequalities, based on panel data (derived from national censuses) in eleven European countries covering two decades from 1990 to 2010.[3] The study was designed to look for associations between socio-economic class recorded in the censuses and deaths, overall and in major categories, such as cardiovascular disease and cancer. They also re-categorised deaths in classes that may indicate behaviours, such as smoking and alcohol. An overall reduction in age-specific mortality was observed over the study period. The study also showed that inequalities were growing wider when relative risks were compared, but absolute differences declined in nine of the eleven countries (including England and Wales). Absolute inequalities in smoking related deaths declined, but they increased for alcohol-related deaths.

— Richard Lilford, CLAHRC WM Director


  1. Chetty R, Stepner M, Abraham S, et al. The Association Between Income and Life Expectancy in the United States, 2001-2014. JAMA. 2016; 315(6):1750-66.
  2. Deaton A. On Death and Money. History, Facts, and Explanations. JAMA. 2016; 315(16): 1703-5.
  3. Mackenbach JP, Kulhánovâ I, Artnik B, et al. Changes in Mortality Inequalities over Two Decades: Register Based Study of European Countries. BMJ. 2016; 353: i1732.

Digital Health and Telehealth

News Blog readers know that the CLAHRC WM Director waxes hot and cold about digital health – while it holds immense potential to improve the quality, safety and efficiency of care, these advantages are seldom realised because of poor design, and the technology has the immense potential to damage the all-precious doctor-patient relationship  (see previous post). Some very encouraging results are reported in a recent viewpoint article in JAMA.[1] First, a tele-monitoring programme in the community reduced admissions for patients with cardiac failure by detecting deterioration early – quite an unusual result, and it would be interesting to find out why this application succeeded where so many others have failed. Second, a tele-ICU programme claimed to reduce mortality, length of stay, and malpractice claims across a number of hospitals – we will report on this study in more detail in a future blog. However, the viewpoint article also shows that the smartphone app industry is out of control with 100,000 health applications available on iTunes alone. Two thirds of apps to calculate insulin dosages provided the wrong answer, for example. Whether this genie can be squeezed back into the bottle is very doubtful.

 — Richard Lilford, CLAHRC WM Director


  1. Agboola SO, Bates DW, Kvedar JC. Digital Health and Patient Safety. JAMA. 2016; 315(16): 1697-8.

Tackling Malaria

CLAHRC Africa is planning a study with Anja Terlouw and Linda Mipando of the Malawi, Liverpool Wellcome Trust Centre, to reduce the prevalence of malaria in villages in Africa. Artemisinin therapy for clinical cases is the single most cost-effective measure for malaria control, while treatment of pregnant women can also bring an important health gain.[1] Major works to drain swamps and remove standing water are beyond scope. So we are considering community-based interventions.

There are many different community-based approaches to the scourge of malaria.[2] Improving the uptake of bed nets is a very widely used approach (for a beautiful map of how the use of bed nets has improved since 2000, see this Tweet by Bill Gates). Bed nets are impregnated with insecticides that are harmless to humans, and can reduce the load of infected vectors in a locality, as well as protect individuals. But uptake is not universal, in part because they are hard to use in the absence of a bed and many people, especially children, sleep on mats. We plan to investigate methods to mitigate the problems, perhaps including an erectable protective dome, like a small tent, for children:

Child underneath small tent

One problem with bed nets is that anopheline mosquitos are developing resistance to the insecticide.

Other approaches include regular indoor residual spraying so that surfaces are coated in a substance lethal to mosquitoes, but this requires fastidious application of the insecticide and is expensive. Yet another approach is mass treatment of whole populations, as discussed in a recent edition of Science.[3] However, this risks promoting resistant strains on a large scale, so a modification of the mass treatment approach, based on screening and treatment, has been advocated. However, that may be ruinously expensive.

Of course, there are approaches aimed at reducing breeding grounds for the vector, which are certainly effective if they can be implemented.[4] Diagnosing and treating pregnant women is an important strategy.[5] Malaria vaccines are starting to look promising,[6] while we wait for widespread application and evaluation of existential approaches, such as introducing sterile males into the unsuspecting anopheline population.

In the meantime, our plan is to select the most propitious community-based method and roll it out in collaboration with authorities, as part of a cluster RCT, perhaps using a step-wedge design.

— Richard Lilford, CLAHRC WM Director


  1. Morel CM, Lauer JA, Evans DB. Cost effectiveness analysis of strategies to combat malaria in developing countries. BMJ. 2005; 331: 1299.
  2. Salam RA, Das JK, Lassi ZS, Bhutta ZA. Impact of community-based interventions for the prevention and control of malaria on intervention coverage and health outcomes for the prevention and control of malaria. Infect Dis Poverty. 2014; 3: 25
  3. Roberts L. Rubber workers on the front lines. Science. 2016; 352(6284): 404-5.
  4. Keiser J, Singer BH, Utzinger J. Reducing the burden of malaria in different eco-epidemiological settings with environmental management: a systematic review. Lancet Infect Dis. 2005; 5: 695-708.
  5. Hill J, Hoyt J, van Eijk AM, et al. Factors affecting the delivery, access, and use of interventions to prevent malaria in pregnancy in sub-Saharan Africa: a systematic review and meta-analysis. PLoS Med. 2013; 10(7): e1001488.
  6. Garner P, Gelband H, Graves P, et al. Systematic Reviews in Malaria: Global Policies Need Global Reviews. Infect Dis Clin N Am. 2009; 23: 387-404.

Migraine and Cardiovascular Disease

It is well established that migraine sufferers have an increased risk of both thrombotic and haemorrhagic stroke. A follow up of 115,541 women in the famous Nurses’ Health Study II, shows that the risk of myocardial infarction is also substantially increased, even after adjusting for risk factors such as obesity, in migraine sufferers. The authors think that migraine is a marker for some sort of endothelial malfunction.[1] The CLAHRC WM Director wondered if migraine sufferers have higher risk of preeclampsia, on the grounds that this is also thought to be an endotheliopathy. He is too late (again) – eight of ten studies in a systematic review found a positive association.[2]

— Richard Lilford, CLAHRC WM Director


  1. Kurth T, Winter AC, Eliasson AH, et al. Migraine and risk of cardiovascular disease in women: prospective cohort study. BMJ; 2016: 353: i2610.
  2. Adeney KL & Williams MA. Migraine Headaches and Preeclampsia: An Epidemiologic Review. Headache. 2006; 46(5): 794-803.

Ask to not whether, but why, before the bell tolls!

In the last News Blog we mentioned the recent overview of trials of different teaching methods. It turns out that frequent interaction with the class is important.[1] Teachers in England tend to ask ‘what’ questions. However, it is more effective to stimulate the minds of pupils with ‘why’ questions, as teachers in Singapore or Shanghai do (the world’s premier cities for pedagogy). As one of the Shanghai teachers said – “I don’t teach physics, I teach my pupils how to learn physics.” In the next News Blog we will summarise the evidence on Problem-Based Learning (PBL).

— Richard Lilford, CLAHRC WM Director


  1. Hattie J. The Applicability of Visible Learning to Higher Education. Scholarship of Teaching and Learning in Psychology. 2015; 1(1): 79-91.


Why the Finding that ‘Most Published Research Findings are False’ is False

The above titled paper is one of the most widely cited in clinical epidemiology.[1] It was published in PLoS Medicine in 2005 and has 3,805 citations on Google Scholar, over 1.7m views online, and has been widely quoted in the media all over the world. The author, John Ioannidis, is arguably the world’s premier clinical epidemiologist.

The essence of Ioannidis’s argument turns on the notion of false positive study results. A false positive study result can arise in two ways; as a result of bias, or because of an alpha error. Ioannidis, in this paper, is not greatly concerned with traditional biases, such as those resulting from selection effect or lack of blinding. He is concerned, however, with dissemination bias in general, including the particular form of bias called p-hacking. P-hacking arises when an investigator performs multiple statistical tests, but selectively reports those with positive results. Since the denominator – the total number of statistical tests carried out – is not declared, a statistical adjustment of multiple comparison is not possible and the published finding is a skewed sample of all the findings. Ioannidis’s article is packed with examples from clinical and epidemiological research.

In the flurry of excitement over the somewhat sensational findings, and perhaps partly because of the awe in which the author is held, nobody seems to have asked the obvious question; why, if most research findings are false, has medical research been so spectacularly successful? How can it be that so many lives are saved by transplantation surgery, vaccination, chemotherapy, control of haemorrhage, and so on, if most research is false!

The fundamental flaw in Ioannidis’s argument is that research is not interpreted entirely by statistical convention. Scientific inference is not simply a question of acting on research results like a pilot reacting to instruments. The point about science is not to count up positive results (weighted by their size), but to generate a scientific theory. The crucial point about Semmelweis’s findings lay not in the number of postpartum death that he observed, but in the germ theory that he inferred.[2] The lesson is obvious; we should stand back from individual research findings, and consider research findings in the round to develop scientific theory. On a lighter note, one is reminded of Ernest Rutherford’s chicken who laid an egg each morning and was rewarded by the farmer. Empirically, evidence pointed to a very satisfactory relationship, until Christmas day! The chicken misinterpreted the data because she did not perceive the underlying (socioeconomic) structure of which she was a part.

None of this means, of course, that Ioannidis is wrong to warn us about the perils of p-hacking, but his argument does remind us of a deep flaw in our current, and hopefully transient, way of interpreting scientific data in research – the convention of dichotomising scientific results on the notion of statistical significance. This, of course, is quite wrong, as argued elsewhere in the news blog. And to be fair, Ioannidis does take a swipe at p values along the way. What he doesn’t do is to draw a proper distinction between statistical results and scientific inference; a p value is not a scientific finding. This is not semantics, it cuts to the heart of the problem, but an input to inference. Dichotomising results on whether or not some confidence limit excludes the null value is atheoretical and prone to mislead. The proper way to interpret a given finding is to build on theory using Bayesian ideas. Theoretical knowledge is encapsulated in a prior probability density. When the data relate directly to the parameter of interest they are used to update this prior (if necessary after adjustment for potential bias using the method of Turner, et al.[3]). If the data are indirectly related to the parameter of interest, then the updating can be done subjectively, or, if possible, through a Bayesian network analysis. Either way, when data are interpreted in this epistemologically sound way, study results are not ‘true’ or ‘false’ – they are simply the results.

Of course none of this is tantamount to disregarding importance of p-hacking, which is a topic we have discussed before.[4] [5]

— Richard Lilford, CLAHRC WM Director


  1. Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Med. 2005; 2(8): e124.
  2. Best M, Neuhauser D. Ignaz Semmelweis and the birth of infection control. Qual Saf Health Care. 2004;13:233–4.
  3. Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009;172(1):21-47.
  4. Lilford RJ. Bullshit Detectors: Look out for ‘p-hacking’. NIHR CLAHRC West Midlands News Blog. 11 September 2015.
  5. Lilford RJ. More on ‘p-hacking’. NIHR CLAHRC West Midlands News Blog. 18 December 2015.

Income, Relative Income and Health

News Blog readers who enjoyed my analysis of Chetty’s monumental JAMA article on income and longevity at age 40 [1] may wish to read a commentary by last year’s Nobel prize winner for economics, Angus Deaton.[2] Some points:

  1. Studies correlating income with longevity over-estimate the association between wealth and age because they assume that people to whom the results are extrapolated will remain in their income groups.
  2. The association between wealth and health overestimates the causal effect of wealth on health because health also influences wealth to a degree.
  3. While the life expectancy of poor people varies widely by locality, those of rich people does not.
  4. Given the poor health of middle-aged Americans, especially white Americans from low socio-economic levels, we can expect to see health disparities of adults widen in the short-term. Health disparities in children in America are declining (see previous post).
  5. In setting policy – especially tax rates – be guided by absolute not relative income disparities. Every society has a top and bottom percentile and always will have; just like more than half of people cannot be above median.
  6. Be careful when someone tells you that health disparities are growing – often (as now) relative disparities widen as absolute disparities decline. This can happen because the same relative risk reduction has a bigger (absolute) effect when baseline rates of ill-health are high (as among poor people) than when they are low (as among the financially better-off).
  7. Education and cognitive ability are independent predictors of both health and wealth. Since parents are important educators, the regress is hard to break.

— Richard Lilford, CLAHRC WM Director


  1. Chetty R, Stepner M, Abraham S, et al. The Association Between Income and Life Expectancy in the United States, 2001-2014.JAMA. 2016; 315(6):1750-66.
  2. Deaton A. On Death and Money. History, Facts, and Explanations. JAMA. 2016; 315(16): 1703-5.

Get every new post delivered to your Inbox.

Join 66 other followers