All posts by clahrcwm

Bayesian Analysis of Clinical Trial Results: Coming of Age

I have long argued for greater use of Bayesian interpretation of clinical trial results – I suggested this approach with respect to trials of treatment for rare diseases back in 1994.[1] This approach is now advocated in a recent report in JAMA.[2] The approach advocated is the use of a neutral, an enthusiastic, and a sceptical prior. The authors also outline an approach to be used when the data is not compatible with the prior; in such a scenario they advocate a ‘data wins’ rule. I would wish to ensure that the trial was of impeccable design with complete follow up before accepting such an approach. For instance, I would allow my sceptical prior to dominate positive results from a potentially biased trial of homeopathy.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ, Thornton JG, Braunholtz D. Clinical Trials and Rare Diseases – a Way Out of a Conundrum. BMJ. 1995; 311: 1621-5.
  2. Quintana M, Viele K, Lewis RJ. Bayesian Analysis: Using Prior Information to Interpret the Results of Clinical Trials. JAMA. 2017; 318(16): 1605-6.

Importance of Cohort Rather Than Cross-Sectional Studies to Determine the Heritability of Conditions

In a recent News Blog we showed how longitudinal studies could improve on cross-sectional epidemiological studies using alcohol-induced effects on lexical cognition, as our example.[1]

Cross-sectional twin studies may also under-estimate inheritability of disorders, such as Autism Spectrum Disorder (ASD), because the control twin may be in a ‘yet to be diagnosed’ state. An interesting article in JAMA shows that the inheritability of ASD is much higher if the correct (longitudinal) method is used.[2] All studies agree that familial environment has hardly any effect on the probability of ASD.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Alcohol and its Effects. NIHR CLAHRC West Midlands News Blog. 18 August 2017.
  2. Sandin S, Lichtenstein P, Kuja-Halkola R, Hultman C, Larsson H, Reichenberg A. The Heritability of Autism Spectrum Disorder. JAMA. 2017; 318(12): 1182-4.

Is it Possible to Teach Empathy?

News blog readers will know that I am fascinated by the question of whether it is possible to teach people to be kinder, more patient-centered, and to show more empathy. A recent meta-analysis of RCTs sheds important light on the critical issue of empathy training.[1] Unlike previous systematic reviews, this study included only experimental studies. Overall, 19 studies met the inclusion criteria for the meta-analysis.

One important issue concerns how the endpoint was measured. In 11 of the 19 included studies the outcome was an objective measure, while in the remainder the outcome was self-reported.

Overall, educational interventions produced a positive benefit that was statistically significant. When the authors made an adjustment for possible publication bias, the effect size was only slightly reduced, remaining highly significant statistically.

I expected to find that the effect size was greater for the self-reported outcomes than for objective outcomes. In fact, the effect size was larger and more highly significant for the objective measures of effect.

Some people classify empathy training in two forms: cognitive and effective, to cover the intellectual and emotional aspects of empathy. Others have questioned this dichotomy, arguing that the emotional and the cognitive parts have to interact to produce empathetic behaviour. As it turned out, all studies included a cognitive component.

This is a very interesting and important study. My main problem with the study is that they do not give a breakdown according to whether the objective measure was self-reported or objective. Also, the results do not tell us how enduring the effects were. I have argued before that one of the main criteria of good communication and compassionate care is the desire to achieve these projectors. The most important thing to instil is a deep-seated desire to do a better job. It would seem that training has a part to play in achieving this objective. However, sustained exposure to excellent role models is also critically important and a crucial part of the education of health professionals.

— Richard Lilford, CLAHRC WM Director


  1. Teding van Berkhout E & Malouff JM. The Efficacy of Empathy Training: A Meta-analysis of Randomized Controlled Trials. J Counsel Psychol. 2016; 63(1): 32-41.

A Debt of Gratitude

For a while now I have been working closely with the African Population and Health Research Center (APHRC), which is based in Nairobi, Kenya. Last week’s issue of the Lancet carried an article on the APHRC, where they paid tribute to the outgoing director Alex Ezeh.[1] Alex had the wisdom to identify the enormous challenge posed by the rapidly expanding slums in African cities. He and his colleagues have produced ground-breaking work on health dynamics in urban slums. He was my inspiration, and I followed where he led. Together we compiled a Lancet Series summarising the state of the literature regarding the health of people living in the slums and proposing models to inform future research and policy making.[2] [3] These studies were recently summarised in the African version of ‘The Conversation’.[4] Our work has resulted in the award of a NIHR unit to study the provision of healthcare in slums in Africa and Asia. We have also secured funds from the Rockefeller foundation to run a Bellagio conference on statistical aspects of slum health. Currently, we are pursuing research into water and sanitation in slums, as this is one of the biggest problems leading to diarrhoea, stunting and death, especially in children under the age of five.

I have an enormous debt of gratitude to APHRC in general and Alex Ezeh in particular. I look forward to my ongoing association with Alex and to working very closely with his outstanding successor, Catherine Kyobutungi, who was also profiled in last week’s Lancet.[5]

— Richard Lilford, CLAHRC WM Director


    1. Green A. The African Population and Health Research Center. Lancet. 2017; 390: 1940.
    2. Ezeh A, Oyebode O, Satterthwaite D, et al. The history, geography, and sociology of slums and the health problems of people who live in slums. Lancet. 2017; 389: 547-58.
    3. Lilford RJ, Oyebode O, Satterthwaite D, et al. Improving the Health and Welfare of People Living in Slums. Lancet .2017; 389: 559-70.
    4. Ezeh A, Sewankambo N, Plot P. Why the Path to Longer and Healthier Lives for all Africans is in Reach. The Conversation. 13 September 2017.
    5. Berman P. Catherine Kyobutungi: leading African health research capacity. Lancet. 2017; 390: 1942.


Breastfeeding and SIDS

Over the years many studies have shown an association between breastfeeding and decreased risk of sudden infant death syndrome (SIDS), with a previous meta-analysis showing an adjusted odds ratio of 0.55 (95% CI 0.44-0.69), which increased to 0.27 (95% CI 0.24-0.31) with exclusive breastfeeding.[1] However, it has been difficult to identify just how long breastfeeding needs to continue to realise this benefit. This is because duration of breastfeeding has not been correlated with reduction in risk. As a follow-up to their original meta-analysis, Thompson and colleagues worked in cooperation with the authors of the included studies to obtain individual-level data.[2] They were able to glean information on duration of breastfeeding so that the association between duration and effect could be examined. In total 9,104 infants were analysed from eight case-control studies. Although analysis showed some protection against SIDS associated with any breastfeeding up to 2 months, this was not statistically significant after controlling for potential confounders. When confounders were controlled for, analysis found that any breastfeeding for at least 2 months, compared to no breastfeeding, had an adjusted odds ratio (aOR) of 0.60 (95% CI 0.44-0.82), while it was a similar aOR of 0.61 (95% CI 0.42-0.87) for exclusive breastfeeding. The aOR for any amount of breastfeeding compared to none improved with increased duration – an aOR of 0.40 (95% CI 0.26-0.63) with 4-6 months breastfeeding, and 0.36 (95% CI 0.22-0.61) with at least 6 months breastfeeding. A similar improvement was seen with at least 4 months of exclusive breastfeeding (aOR 0.46, 95% CI 0.29-0.74).

In order to lower the incidence of SIDS it is important that new mothers are encouraged to breastfeed and to continue for at least 2 months, even if they are unable to do so exclusively, as any amount of breastfeeding seems to confer more protection than none.

— Peter Chilton, Research Fellow


  1. Hauck FR, Thompson JM, Tanabe KO, Moon RY, Vennemann MM. Breastfeeding and reduced risk of sudden infant death syndrome: a meta-analysis. Pediatrics. 2011; 128(1): 103–10
  2. Thompson JMD, Tanabe K, Moon RY, Mitchell EA, McGarvey C, Tappin D, Blair PS, Hauck FR. Duration of Breastfeeding and Risk of SIDS: An Individual Participant Data Meta-analysis. Pediatrics. 2017: e20171324.

Machine Learning and the Demise of the Standard Clinical Trial!

An increasing proportion of evaluations are based on database studies. There are many good reasons for this. First, there simply is not enough capacity to do randomised comparisons of all possible treatment variables.[1] Second, some treatment variables, such as ovarian removal during hysterectomy, are directed by patient choice rather than experimental imperative.[2] Third, certain outcomes, especially those contingent on diagnostic tests,[3] are simply too rare to evaluate by randomised trial methodology. In such cases, it is appropriate to turn to database studies. And when conducting database studies it is becoming increasingly common to use machine learning rather than standard statistical methods, such as logistic regression. This article is concerned with strengths and limitations of machine learning when databases are used to look for evidence of effectiveness.

When conducting database studies, it is right and proper to adjust for confounders and look for interaction effects. However, there is always a risk that unknown or unmeasured confounders will result in residual selection bias. Note that two types of selection are in play:

  1. Selection into the study.
  2. Once in the study, selection into one arm of the study or another.

Here we argue that while machine learning has advantages over RCTs with respect to the former type of bias, it cannot (completely) solve the problem of selection to one type of treatment vs. another.

Selection into a Study and Induction Across Place and Time (External Validity)
A machine learning system based on accumulating data across a health system has advantages with respect to the representativeness of the sample and generalisations across time and space.

First, there are no exclusions by potential participant or clinician choice that can make the sample non-representative of the population as a whole. It is true that the selection is limited to people who have reached the point where their data become available (it cannot include people who did not seek care, for example), but this caveat aside, the problem of selection into the study is strongly mitigated. (There is also the problem of ‘survivor bias’, where people are ‘missing’ from the control group because they have died, become ineligible or withdrawn from care. We shall return to this issue.)
Second, the machine can track (any) change in treatment effect over time, thereby providing further information to aid induction. For example, as a higher proportion of patients/ clinicians adopt a new treatment, so intervention effect can be examined. Of course, the problem is not totally solved, because the possibility of different effects in other health systems (not included in the database) still exists.

Selection Once in a Study (Internal Validity)
However, the machine cannot do much about selection to intervention vs. control conditions (beyond, perhaps, enabling more confounding variables to be taken into account). This is because it cannot get around the cause-effect problem that randomisation neatly solves by ensuring that unknown variables are distributed at random (leaving only lack of precision to worry about). Thus, machine learning might create the impression that a new intervention is beneficial when it is not. If the new intervention has nasty side-effects or high costs, then many patients could end up getting treatment that does more harm than good, or which fails to maximise value for money. Stability of results across strata does not vitiate the concern.

It could be argued, however, that selection effects are likely to attenuate as the intervention is rolled out over an increasing proportion of the population. Let us try a thought experiment. Consider the finding that accident victims who receive a transfusion have worse outcomes than those who do not, even after risk-adjustment. Is this because transfusion is harmful, or because clinicians can spot those who need transfusion, net of variables captured in statistical models? Let us now suppose that, in response to the findings, clinicians subsequently reduce use of transfusion. It is then possible that changes in the control rate and in the treatment effect can provide evidence for or against cause and effect explanations. The problem here is that bias may change as the proportions receiving one treatment or the other changes. There are thus two possible explanations for any set of results – a change in bias or a change in effectiveness, as a wider range of patients/ clinicians receive the experimental intervention. It is difficult to come up with a convincing way to resolve the cause and effect problem. I must leave it to someone cleverer than myself to devise a theorem that might shed at least some light on the plausibility of the competing explanations – bias vs. cause and effect. But I am pessimistic for this general reason. As a treatment is rolled out (because it seems effective) or withdrawn (because it seems ineffective or harmful), so the beneficial or harmful effect (even in relative risk ratio terms) is likely to attenuate. But the bias is also likely to attenuate because less selection is taking place. Thus the two competing explanations may be confounded.

There is also the question of whether database studies can mitigate ‘survivor bias’. When the process (of machine learning) starts, then survivor bias may exist. But, by tracking estimated treatment effect over time, the machine can recognise all subsequent ‘eligible’ cases as they arise. This means that the problem of survivor bias should be progressively mitigated over time?

So what do I recommend? Three suggestions:

  1. Use machine learning to provide a clue to things that you might not have suspected or thought of as high priority for a trial.
  2. Nest RCTs within database studies, so that cause and effect can be established at least under specified circumstances, and then compare the results with what you would have concluded by machine learning alone.
  3. Use machine learning on an open-ended basis with no fixed stopping point or stopping rule, and make data available regularly to mitigate the risk of over-interpreting a random high. This approach is very different to the standard ‘trial’ with a fixed starting and end data, data-monitoring committees,[4] ‘data-lock’, and all manner of highly standardised procedures. Likewise, it is different to resource heavy statistical analysis, which must be done sparingly. Perhaps that is the real point – machine learning is inexpensive (has low marginal costs) once an ongoing database has been established, and so we can take a ‘working approach’, rather than a ‘fixed time point’ approach to analysis.

— Richard Lilford, CLAHRC WM Director


    1. Lilford RJ. The End of the Hegemony of Randomised Trials. 30 Nov 2012. [Online].
    2. Mytton J, Evison F, Chilton PJ, Lilford RJ. Removal of all ovarian tissue versus conserving ovarian tissue at time of hysterectomy in premenopausal patients with benign disease: study using routine data and data linkageBMJ. 2017; 356: j372.
    3. De Bono M, Fawdry RDS, Lilford RJ. Size of trials for evaluation of antenatal tests of fetal wellbeing in high risk pregnancy. J Perinat Med. 1990; 18(2): 77-87.
    4. Lilford RJ, Braunholtz D, Edwards S, Stevens A. Monitoring clinical trials—interim data should be publicly available. BMJ. 2001; 323: 441


A Very Interesting Paper Using Mendelian Randomisation to Determine the Effect of Extra Years of Education on Heart Disease

It turns out that there are a number of genes, all associated with aspects of neurodevelopment, that predict how many years a person will spend in formal education.[1] It is already very well established that more years of education are associated with large reductions in coronary heart disease (CHD) (mediated by behaviour such as lower calorie intake, less smoking, more exercise).[2] So the authors of a recent well-written and most interesting BMJ paper did the obvious thing.[3] [4] They related the (random) presence or absence of educational propensity genes to CHD. Bingo, they measured a large effect (the genes that predispose to larger durations of formal education associate with reduced CHD). Now, the thing with Mendelian randomisation is that the genotype must not be linked to the outcome (CHD in this case), other than through the putative explanatory variable (duration of education in this case). The authors are aware that it is quite possible that education genes are linked to the outcome (CHD), net of (any) effect on education. To deal with this possibility they perform sensitivity analyses. They examine the association of genetic variates associated with education and the behaviours that lead to CHD. If the effects on education and on CHD behaviours are similar across the genetic variates this suggests that the effect on CHD is through education and not through another variable. And so it was. They also looked to see whether genetic variants already known to be associated with CHD (genes for high cholesterol, etc.) were also associated with education. If the genes associated with education do not associate with these other risk factors, then that favours a cause and effect explanation. There was no association. However, such an association would only be expected if there was a ‘massive’ effect of ‘education genes’ that bypassed education.

This all falls short of proof. Since the educational genes lead to education through mental processes, it is reasonable to suppose that almost all genetic variates that affect education also affect behaviour. Thus, they would affect CHD, even if there was no extra education. The authors say that their conclusion is strongly supported by identical twin studies where one twin stayed longer in education than the other, but this too ignores the fact that these twins are different, for all that their inherited genotype is the same, and so these differences could be the cause of both increased education and decrease in the behaviours that lead to heart disease.

One more point ­– even if years of education really are causative, this might well apply only to people genetically predisposed to more education and may not apply among those not so predisposed – there may be an interaction between the genes that predisposes to education and response to that education. After all, why would one persist in the classroom if you were not predisposed to benefit from the experience? People not predisposed would find being coerced to do so most unpalatable, and such an approach could even have a perverse effect. This is an excellent article and is beautifully presented. But I am a little more sceptical than the authors. I would like to see a debate on the issues.

— Richard Lilford, CLAHRC WM Director


  1. Okbay A, Beauchamp JP, Fontana MA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature. 2016; 533(7604): 539–42.
  2. Veronesi G, Ferrario MM, Kuulasmaa K, et al. Educational class inequalities in the incidence of coronary heart disease in EuropeHeart. 201635895865.
  3. Tillmann T, Vaucher J, Okbay A, et al. Education and coronary heart disease: Mendelian randomisation study. BMJ. 2017; 358: j3542.
  4. Richards JB & Evans DM. Back to School to Protect Against Coronary Heart Disease? BMJ. 2017; 358: j3849.

The Rush Towards a Paperless Health Service: Stop the Music

I have written repeatedly on the harms that bringing IT into the consultation room can bring. I carried out a trial of computerised data entry during consultations showing that it undermined the clinician/patient relationship – this was over three decades ago.[1] This finding has been replicated many times since, with vivid accounts in Bob Wachter’s outstanding book.[2] It turns out that having to use IT during consultations is one of the main causes of ‘burn-out’ among doctors in the USA. A recent NIHR Programme study, in which CLAHRC WM collaborated, showed that IT is likely undermining patient safety in some aspects of practice (diagnosis; personalised care), even as it improves it in others (prescribing error).[3] Meanwhile, Fawdry has repeatedly argued [4] that the problems in integrating computers across institutions arise not because the IT systems themselves are ‘incompatible’ or because we do not have common ‘ontologies’, but because the underlying medical logic does not synchronise when you slap two systems together. Data in records can be shared (lab results, x-rays, etc.), but that is different to sharing records. So, do not force the pace – let a paperless system evolve. Apply the CLAHRC WM test; never implement an IT system module until you have examined how it affects the pattern of clinician/patient interactions in real world settings. We are sleep-walking into a digital future and completely ignoring the cautionary evidence that is becoming stronger by the year. Remember, nothing in health care – and I really do mean nothing – is as important as the relationships between clinician and patient.

— Richard Lilford, CLAHRC WM Director


  1. Brownbridge G, Lilford RJ, Tindale-Biscoe S. Use of a computer to take booking histories in a hospital antenatal clinic. Acceptability to midwives and patients and effects on the midwife-patient interaction. Med Care. 1988; 26(5): 474-87.
  2. Wachter R. The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age. New York, NY: McGraw-Hill Education. 2015.
  3. Lilford RJ. Introducing Hospital IT Systems – Two Cautionary Tales. NIHR CLAHRC West Midlands News Blog. 4 August 2017.
  4. Fawdry R. Paperless records are not in the best interest of every patient. BMJ. 2013; 346: f2064.

Context is Everything in Service Delivery Research

I commonly come across colleagues who say that context is all in service delivery research. They argue that summative quantitative studies are not informative because there is so much variation by context that the average is meaningless. I think that this is lazy thinking. If context was all, then there would be no point in studying anything by any means; any one instance would be precisely that – one instance. If the effects of an intervention were entirely context-specific then it would never be permissible to extrapolate from one situation to another, irrespective of the types of observations made. But nobody thinks that.

A softer anti-quantitative view accepts the idea of generalising across contexts, but holds that such generalisations / extrapolations can be built up solely from studies of underlying mechanisms, and that in-depth qualitative studies can tell us all we need to know about those mechanisms. Proponents of this view hold that quantitative epidemiological studies are, at best, extremely limited in what they can offer. It is true that some things cannot easily be studied in a quantitative comparative way – an historian interested in the cause of the First World War cannot easily compare the candidate explanatory variables over lots of instances. In such a case, exploration of various individual factors that may have combined to unleash the catastrophe may be all that is available. But accepting this necessity is not tantamount to eschewing quantitative comparisons when they are possible. It is unsatisfying to study just the mechanisms by which improved nurse ratios may reduce falls or pressure ulcers without measuring whether the incidence of these outcomes is, in fact, correlated with nurse numbers.

Of course, concluding that quantification is important is not tantamount to concluding that quantification alone is adequate. It never is and cannot be, as the famous statistician, Sir Austin Bradford Hill, implied in his famous speech.[1] Putative causal explanations are generally strengthened when theory generated from one study yields an hypothesis that is supported by another study (Hegel’s thesis, antithesis, synthesis idea). Alternatively, or in addition, situations arise when evidence for a theory, and for hypotheses that are contingent on that theory, may arise within the framework of a single study. This can happen when observations are made across a causal chain. For example, a single study may follow up heavy, light and non-drinkers and examine the size of the memory centre in the brain (by MRI) and their memory (through a cognitive test).[2] The theory that alcohol affects memory is supported by the finding that memory declines faster in drinkers than teetotallers, and yet further support comes from alcohol’s effect on the size of the memory centre (the hippocampus). Similarly, a single study may show that improving the nurse to patient ratio results in a lower probability of unexpected deaths and more diligent monitoring of patients’ vital signs. Here the primary hypothesis that the explanatory variable (nurse/patient ratio) is correlated with the outcome variable (unexpected hospital death) is reinforced by also finding a correlation between the intervening / mediatory variable (diligence in monitoring vital signs) and the outcome variable (hospital deaths) (see Figure 1). In a previous News Blog we have extolled the virtues of Bayesian networking in quantifying these contingent relationships.[3]

088 DCB - Context Fig 1

Figure 1: Causal chain linking explanatory variable (intervention) and outcome

Observations relating to various primary and higher order hypotheses may be quantitative or qualitative. Qualitative observations on their own are seldom sufficient to test a theory and make reliable predictions. But measurement without a search for mechanisms – without representation / theory building – is effete. The practical value of science depends on ‘induction’ – making predictions over time and space. Such predictions across contexts require judgement, and such judgement cannot gain purchase without an understanding of how an intervention might work. Putting these thoughts together (the thesis, antithesis, synthesis idea and the need for induction), we end up with a ‘realist’ epistemology – the idea here is to make careful observations, interpret them according to the scientific canon, and then represent the theory – the underlying causal mechanisms. In such a framework, qualitative observations complement quantitative observations and vice-versa.

It is because results are sensitive to context that mechanistic / theoretical understanding is necessary. Context refers to things that vary from place to place and that might influence the (relative or absolute) effects of an intervention. It is also plausible to argue that context is more influential with respect to some types of intervention than others. Arguably, context is (even) more important in service delivery research than in clinical research. In that case, one might say that understanding mechanisms is even more important in service delivery research than in clinical research. At the (absolute) limit, if B always follows A, then sound predictions may be made in the absence of an understanding of mechanisms – the Sun was known to always come up in the East, even before rotation of the Earth was discovered. But scientific understanding requires more than just following the numbers. A chicken may be too quick to predict that a meal will follow egg-laying just because that has happened on 364 consecutive days, while failing to appreciate the underlying socioeconomic mechanisms that might land her on a dinner plate on the 365th day, in Bertrand Russell’s evocative example.[4]

Moving on from a purely epistemological argument, there is plenty of empirical data to show that many quantitative findings are replicated across a sufficient range of contexts to provide a useful guide to action. Here are some examples. The effect of ‘user fees’ and co-payments on consumption of health care are quite predictable – demand is inelastic on price, meaning that a relatively small increase in price, relative to average incomes, suppresses demand. Moreover, this applies irrespective of medical need,[5] and across low- and high-income countries.[6] Audit and feedback as a mechanism to improve the effectiveness of care has consistently positive, but small (about 8% change in relative risk) effects.[7] Self-care for diabetes is effective across many contexts.[8] Placing managers under risk of sanction has a high risk of inducing perverse behaviour when managers do not believe they can influence the outcome.[9] It is sometimes claimed that behavioural / organisational sciences are qualitatively distinct from natural sciences because they involve humans, and humans have volition. Quite apart from the fact that we are not the only animals with volition (we share this feature with other primates and cetaceans), the existence of self-determination does not mean that interventions will not have typical / average effects across groups or sub-groups of people.

The diabetes example, cited above, is particularly instructive because it makes the point that the role of context is amenable to quantitative evaluation – context may have no effect, it may modify an effect (but not vitiate it), it may obliterate an effect, or even reverse the direction of an effect. Tricco’s iconic diabetes study [8] combined over 120 RCTs of service interventions to improve diabetes care (there are now many more studies and the review is being updated). The study shows not just how the effect of interventions vary by intervention type, but also how the intervention effect itself varies by context. It is thus untenable to claim, as some do, that ‘what works for whom, under what circumstances’ is discernible only by qualitative methods.[10] The development economist, Abhijit Banerjee, goes further, arguing that the main purpose of RCTs is to generate unbiased point estimates of effectiveness for use in observational studies of the moderating effect of context on intervention effects.[11]

We have defined context as all the things that might vary from place to place and that might affect intervention effects. Some people conflate context with how an intervention is taken up / modified in a system. This is a conceptual error – how the intervention is applied in a system is an effect of the intervention and like other effects, it may be influenced by context. Likewise, everything that happens ‘downstream’ of an intervention as a result of the intervention is a potential effect, and again, this effect may be affected by context.[12] Context includes upstream variables (see Figure 2) and any downstream variable at baseline. All that having been said, it is not always easy to distinguish when a change in a downstream variable is caused by the intervention, or whether it is a change in a variable that would have happened anyway (i.e. a temporal effect). Note, that a variable such as the nurse-patient ratio may be an intervention in one study (e.g. a study of nurse-patient ratios) and a context variable in another (e.g. a study of an educational intervention to reduce falls in hospital). Context is defined by its role in the inferential cause / effect framework, not by the kind of variable it is.

088 DCB - Context Fig 2

Figure 2: How to conceptualise the intervention, the effects downstream, and the context.

— Richard Lilford, CLAHRC WM Director


  1. Hill AB. The environment and disease: Association or causation? Proc R Soc Med. 1965; 58(5): 295-300.
  2. Topiwala A, Allan C, Valkanova V, et al. Moderate alcohol consumption as risk factor for adverse brain outcomes and cognitive decline: longitudinal cohort studyBMJ. 2017; 357:j2353.
  3. Lilford RJ. Statistics is Far Too Important to Leave to Statisticians. NIHR CLAHRC West Midlands News Blog. 27 June 2014.
  4. Russell B. Chapter VI. On Induction. In: Problems of Philosophy. New York, NY: Henry Holt and Company, 1912.
  5. Watson SI, Wroe EB, Dunbar EL, Mukherjee J, Squire SB, Nazimera L, Dullie L, Lilford RJ. The impact of user fees on health services utilization and infectious disease diagnoses in Neno District, Malawi: a longitudinal, quasi-experimental study. BMC Health Serv Res. 2016; 16(1): 595.
  6. Lagarde M & Palmer N. The impact of user fees on health service utilization in low- and middle-income countries: how strong is the evidence? Bull World Health Organ. 2008; 86(11): 839-48.
  7. Effective Practice and Organisation of Care (EPOC). EPOC Resources for review authors. Oslo: Norwegian Knowledge Centre for the Health Services; 2015.
  8. Tricco AC, Ivers NM, Grimshaw JM, Moher D, Turner L, Galipeau J, et al. Effectiveness of quality improvement strategies on the management of diabetes: a systematic review and meta-analysis. Lancet. 2012; 379: 2252–61.
  9. Lilford RJ. Discontinuities in Data – a Neat Statistical Method to Detect Distorted Reporting in Response to Incentives. NIHR CLAHRC West Midlands News Blog. 1 September 2017.
  10. Pawson R & Tilley N. Realistic Evaluation. London: Sage. 1997.
  11. Banerjee AV & Duflo E. The Economic Lives of the Poor. J Econ Perspect. 2007; 21(1): 141-67.
  12. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

“Why Can’t a Man Be More Like a Woman” – Revisited

In a previous News Blog [1] I reorganised Henry Higgins’s famous line from ‘My Fair Lady’ in response to a paper in JAMA based on MediCare records showing that SMRs (standardised mortality rates) following acute medical admissions were slightly lower when the admitting physician was a woman rather than a man.[2] So what about surgery then? Same pattern I am afraid blokes! Slightly lower adjusted odds ratio (0.96) for harm.[3] True? Probably, since women outperform men on many tasks requiring a combination of care and cognition, as per the above News Blog. But results of this sort may be ephemeral – gender based predilections are notoriously labile as different selection and cultural effects play out in society. For example, the proportion of women studying and excelling in STEM (Science, Technology, Engineering and Mathematics) subjects has been rising steadily.[4] The proportion of women who become boxers or get incarcerated is also rising.[5][6] It seems that women and men are becoming more like each other! But will they ever become the same as each other? The effect of gender on surgical outcomes has been heavily debated and was the topic of the Editor in Chief’s editorial,[7] yet this point about change in the attributes of men vs. women over time was not discussed.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Are Female Doctors Better Than Male Doctors? NIHR CLAHRC West Midlands News Blog. 13 January 2017.
  2. Tsugawa Y, Jena AB, Figueroa JF, et al. Comparison of Hospital Mortality and Readmission Rates for Medicare Patients Treated by Male vs Female Physicians. JAMA Intern Med. 2017; 177(2): 206-13.
  3. Wallis CJD, Ravi B, Coburn N, et al. Comparison of postoperative outcomes among patients treated by male and female surgeons: a population based matched cohort study. BMJ. 2017; 359: j4366.
  4. WISE. Women in Science, Technology, Engineering and Mathematics: The Talent Pipeline from Classroom to Boardroom. UK Statistics 2014. Bradford: WISE; July 2015.
  5. Sport England. Record Number of Women Get Active. 8 December 2016.
  6. Swavola E, Riley K, Subramanian R. Overlooked: Women and Jails in an Era of Reform. New York, NY: Vera: Injustice of Justice. August 2016.
  7. Marx C. Improving patient outcomes after surgery. BMJ. 2017; 359: j4580.