Tag Archives: Director & Co-Directors’ Blog

Electronic Patient Notes and Patient Safety

In a previous news blog we drew attention to the psychological consequences of insinuating a computer into the clinician-patient consultation. The deleterious effect of computers on the clinician-patient interaction was published by the CLAHRC WM Director over three decades ago.[1] This concern has been corroborated in Robert Wachter’s recent book.[2]

In this news blog we will focus on another disadvantage of the current generation of electronic clinical notes. This is the threat to patient safety that results from the inchoate nature of electronic clinical record systems. In short, they do not reflect the heuristic patterns that determine appropriate clinical care. This failure to follow medical logic is most dangerous in the context of diagnosis.

While clinical diagnosis may sometimes be clear from a single episode of care, the correct diagnosis frequently depends on the pattern of data as it emerges over time. Just as an understanding of politics requires an understanding of history, so a clinical diagnosis frequently requires an understanding, not just of the current symptoms and signs, but also of their provenance.

Building on this safe premise, one has to conclude that the way clinical record systems are organised can either facilitate or hinder accurate diagnosis. Diagnostic errors are the single largest threat to patient safety [3]; it is all very well to promote evidence-based care and safe prescribing, but if the patient is on the wrong pathway, then they are heading for a big disaster. Diagnosis lies at the heart of clinical medicine and a system that impedes accurate diagnosis is likely to do more harm than good. A recent study of electronic notes carried out by Aziz Sheikh, in collaboration with CLAHRC WM, showed that big reductions in medication error could be achieved by means of electronic prescribing and decision support [in press]. However, careful qualitative research showed that the electronic prescribing systems disrupted the normal flow of knowledge, and that doctors had to implement numerous ‘work-arounds’. This way be dragons!

The problems of disorganised clinical information have been the subject of investigation for over half a century; Professor Laurence Weed’s system of problem-orientated notes inspired the CLAHRC WM Director when he was a young doctor (a long time ago)! The most urgent requirement in modern computing is not to saturate the health service with electronic records, but to develop them in a way that preserves the logic of medical practice. In the meantime we should rely on structured paper-based notes, such as those recommended by Rupert Fawdry,[5] and confine the use of computers to things that they really are good at, such as electronic prescribing and disseminating medical images.

— Richard Lilford, CLAHRC WM Director


  1. Brownbridge G, Lilford RJ, Tindale-Biscoe S. Use of a computer to take booking histories in a hospital antenatal clinic. Acceptability to midwives and patients and effects on the midwife-patient interaction. Med Care. 1988;26(5):474-87.
  2. Wachter R. The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age. New York, NY: McGraw-Hill Education. 2015.
  3. Singh H, Schiff GD, Graber ML, Onakpoya I, Thompson MJ. The global burden of diagnostic errors in primary care. BMJ Qual Saf. 2017; 26: 484-94.
  4. Lilford RJ. The WISDAM of Rupert Fawdry. NIHR CLAHRC West Midlands News Blog. 5 September 2014.

The Health Economics of Infertility Treatment

Recently I found myself holding forth on the above topic in a plenary talk at the International Federation of Fertility Societies combined meeting with the African Fertility Society in Kampala. I made the point that the health economics of infertility raises a number of issues that are not generally considered in the standard canon for health economic assessment of health technology assessments (HTA).[1] Four issues stand out:

  1. The benefits of infertility treatment are more difficult to capture on a single quality of life (QoL) scale than in the case for standard HTA.
  2. The standard practice of discounting benefits can be questioned.
  3. The beneficiaries are more diverse, potentially extending to many family members.
  4. The issue of whether the lifelong utility of the potential child should be include is controversial.

I shall briefly consider these in turn.

  1. Benefit – generic quality of life (QoL) scales do not seem up to the job. First, it is very difficult to capture the benefits over a lifetime. The ‘area under the curve’ is the important relevant quantity and this is not well captured in cross-sectional studies. Second, QoL deteriorates when a sub-fertile couple have a baby, as it does for fertile people. I discovered this many years ago in a collaborative study with the Health Economics department at the University of York (unpublished). This finding reinforces the importance of a lifetime perspective. Third, it is doubtful that maximisation of the dimensions captured in a generic QoL scale are the things that people wish to maximise when they decide to have children – there is a deeper purpose in play. So, a utility function based on a direct trade-off would be preferable to a standard generic QoL scale, such as the SF-12 or EQ-5D. This way, the respondent can take a lifetime perspective and factor in all the valued benefits and disbenefits of treatment. Torrance used a standard gamble on a large study of US citizens and measured a disutility of 0.07 (utility 0.93).[2] That is to say, the average respondent would run up to a 7% risk of death to enable them to have a first child. Such a standard gamble method would likely underestimate the utility loss for those who actually experience infertility for reasons David Arnold and I explicated elsewhere.[3] A perhaps better method to capture the benefit over a lifetime would be willingness-to-pay studies and here, in addition to studies at the population level (say using discrete choice methods), studies of revealed preferences are possible. This is because much IVF takes place in an entirely private market. This enables the ‘market clearing’ price for infertility services to be observed (ideally in relation to disposable income). The high proportions of disposable incomes infertile people allocate to infertility treatment, sometimes amounting to catastrophic losses,[4] provides some evidence that Torrance’s study underestimates the trade-offs people will make in order to have children.
  2. Choice of discount rate – The fact that benefits continue to accrue, and may increase, over time, suggests that discounting may not be normative. Given that disbenefits generally precede benefits, it makes little sense to discount from the point of intervention. That said, it is important to factor disbenefits of treatment and downstream costs into the analysis. Disbenefits include the cost and discomfort of treatment and knock-on costs, for example, resulting from an increased risk of prematurity. Conversely, there may be hidden benefits beyond the joys of parenthood – for example, in reduced Intimate Partner Violence.[5]
  3. Diverse beneficiaries – In ‘normal’ health economics, benefits are hypotheticated on the affected person, even though loved ones also stand to benefit. Loved ones benefit through the improved health of the affected person. Ignoring third party benefits can be condoned on the ‘level playing field’ principle – in a comparison across diseases of middle-age, beneficiaries of various alternative treatments are in a roughly similar position – they have similar numbers of loved ones on average. On this basis, the decision tree can be ‘pruned’. It could be argued that this argument breaks down when comparisons are made across generational lines. In the particular case of infertility, mothers and fathers get direct benefits, as do grandparents and others, not only through the ‘affected’ person, but directly from the child that results from the treatment. For instance, the father is just as much a beneficiary as the mother. Grandparents are not far behind – I can attest to that. On the other hand, factoring these beneficiaries into the equation seems to tilt the playing field too far the other way, i.e. towards infertility services. Factoring the benefits that accrue to all these people would weight services for children in general, and infertility services in particular, very strongly. This is a topic requiring more philosophical analysis and, perhaps, empirical investigation.
  4. What about the child who would otherwise not have existed – the question of the utility of the hypothetical lives is vexed. Certainty, no-one counts the utility loss from contraception, even when no later ‘replacement child’ is envisaged. On the other hand, the utility of neonatal survival is included in standard economic practice. My preference is not to include this utility, but I am hard-pressed to defend this on a bottom-up, philosophical basis. Richard Hare, the famous Oxford philosopher, did attempt such an analysis and his conclusions support my instinct. Certainly, including the lifetime utility of the child massively improves the cost-benefit ratio of infertility services,[6-8] and if the tax return from the child is included, then a treatment such as IVF becomes a ‘no-brainer’ since it ‘dominates’ – it saves money and yields benefit down to a very low success rate (<6% of so).[9]

What would happen if we:

  1. Accepted a utility function of 0.9 (close to that of Torrance).
  2. Ignored other beneficiaries, including the child?

We present such an analysis below. Even under these relatively conservative constraints, IVF is cost-effective in most countries, and could be cost-effective in LMICs if some new idea, such as incubation within the vagina, were used.

Say we gave infertility a disutility of 0.1 over 50 years, undiscounted.
Then, the utility in an infertile couple successfully treated = 5 QALYs undiscounted and 1.1 discounted at 3%.
Let’s say society will pay $100 per QALY.
Then a treatment with a 25% success rate can have a net cost of up to $125 undiscounted, but less than $28 discounted.

— Richard Lilford, CLAHRC WM Director

I thank Sheryl van der Poel for sending some of the references quoted in this article.


  1. Drummond MF. Methods for the economic evaluation of health care programmes. Oxford: Oxford University Press. 2005.
  2. Torrance GW. Measurement of Health State Utilities for Economic Appraisal. J Health Econ. 1986; 5: 1-30.
  3. Arnold D, Girling A, Stevens A, Lilford R. Comparison of direct and indirect methods of estimating health state utilities for resource allocation: review and empirical analysis. BMJ; 2009; 339: b2688.
  4. Wu AK, Odisho AY, Washington III SL, Katz PP, Smith JF. Out-of-Pocket Fertility Patient Expense: Data from a Multicenter Prospective Infertility Cohort. J Urology. 2014; 191(2): 427-32.
  5. Stellar C, Garcia-Moreno C, Temmerman M, van der Poel S. A systematic review and narrative report of the relationship between infertility, subfertility, and intimate partner violence. Int J Gynecol Obstet, 2016; 133: 3-8.
  6. Connolly MP, Pollard MS, Hoorens S, Kaplan BR, Oskowitz SP, Silber SJ. Long-term Economic Benefits Attributed to IVF-conceived Children: A Lifetime Tax Calculation. Am J Manag Care. 2008; 14(9): 598-604.
  7. Svensson A, Connolly M, Gallo F, Hägglund L. Long-term fiscal implications of subsidizing in-vitro fertilization in Sweden: A lifetime tax perspective. Scand J Pub Health. 2008; 36: 841-9.
  8. Fragoulakis V & Maniadakis N. Estimating the long-term effects of in vitro fertilization in Greece: an analysis based on a lifetime-investment model. Clinicoecon Outcomes Res. 2013; 5: 247-55.
  9. Baird DT, Collins J, Egozcue J, et al. Fertility and Ageing. Hum Reprod Update. 2005; 11(3): 261-76.

Traditional Healers and Mental Health

The case for traditional healers in mental health

There are two arguments for traditional healer involvement in mental health provision; one pragmatic and one theoretical. The pragmatic argument turns on the huge shortfall in human resources to deal with mental health problems in low- and middle-income countries (LMICs).[1] Traditional healers could make up for this shortage in human resources in the formal sector. A theoretical argument for the role of traditional healers turns on cultural factors. The argument here is that traditional healers are ideally placed to intervene in conditions with social origins, or when symptoms are coloured by cultural assumptions. Traditional healers, one might suppose, can tap into the beliefs and expectation of local people to reach parts of the mind that are simply inaccessible under a ‘medical model’. According to this argument modern medicine is the appropriate vehicle for the diagnosis and management for the conditions that are mainly of the body. It would be unwise, for example, to rely on traditional healers for the treatment of an acutely febrile child, or for provision of contraceptive advice. However, the traditional healer might be the appropriate first port of call for people with conditions of the mind.

The case against traditional healers in mental health

An argument against the above position is that the most serious types of mental health condition, psychotic illnesses, require modern pharmacotherapy, at least to stabilise patients. While all psychiatric conditions are of both brain and mind, psychotic conditions can be closer in form to those of standard medical diseases and the effects of properly targeted chemotherapy can be dramatic. There are many well documented cases where access to appropriate pharmacological therapy was denied or cruelly delayed while patients were treated unsuccessfully by traditional healers. From this perspective one should no more consult a traditional healer for a mental illness than for suspected malaria.

Reconciling the case for and against: a topic for investigation and research

On the one hand, traditional healers can offer culturally sensitive treatment for non-psychotic conditions, while on the other hand, severe mental illness requires medical services. It could be argued that traditional and modern medical services should be integrated so that traditional healers could treat the majority of patients, i.e. those with non-psychotic diseases, while allopathic clinicians treat the more severe cases. Moreover, different people have different preferences, and individuals may wish to receive care from both types of providers, even for the same illness. These would seem to be further arguments to integrate traditional and allopathic services within the same system and, indeed, in an integrated reimbursement system. Before implementing such a system it would surely be sensible to evaluate the effectiveness of traditional healers in the treatment of various psychiatric conditions and to ensure that, with the appropriate education, they would be able to refer cases that need medical treatment.

Philosophical problems in collaboration between traditional healing and modern medicine

The CLAHRC WM Director is keen to explore the relative effectiveness of traditional and allopathic treatments for non-psychotic mental illness but he is concerned that there may be irreconcilable philosophical differences in the traditional versus allopathic approach. This concern arises from different ontologies that underpin the different kinds of service. That is to say these traditions have different views on what counts as truth. Modern medical practice is very much a product of what might be called ‘enlightenment thinking’; practice built on an understanding of biological mechanisms / scientific explanations.[2] Such a world view is a far cry from the assumptions that underpin traditional healing, and which are guided by a set of traditional beliefs, often of a religious nature. So the question is whether it is possible to truly integrate systems with such different sets of underpinning assumptions? This is partly an empirical question – different systems could be examined to understand how well they can work together. The CLAHRC WM Director understands that moves are afoot to integrate allopathic medicine with traditional Chinese medicine in China, and in Ayurvedic medicine in India. It would be interesting to make independent studies of these systems. But in the meantime I would suggest a thought experiment. Let us imagine a proposed trial of rose-hip water vs. anti-depressant medication taking place in an integrated hospital. The allopathic practitioners present this as a placebo-controlled trial, while the traditional healers present this as a trail of two effective alternatives – the underlying belief systems determine how the treatments are presented. The CLAHRC WM Director suspects that it is very difficult to really integrate two systems based on very different philosophical premises. It is one thing to make irenic statements about mutual respect and so on, but another to supress tensions that seem likely to arise from fundamentally irreconcilable philosophical assumptions.

Living with contradictions

The question of integrating these different systems of thought is, perhaps, unresolvable. The systems have existed side by side for a hundred years or more. In high-income countries there is a thriving industry in complementary therapies and the list of alternative methods is almost too long to recite. Likewise traditional medicine and modern medicine have existed side by side quite happily in Africa, South Asia and China for many years. The populations in all these countries seem, on the whole, pretty savvy at working out which method is more appropriate for them in which condition. I have never heard of anyone going to a homeopath for their family planning needs. But systems co-existing in society is one thing, integrating them in common administrative and reimbursement systems is another. Every now and then there is an attempt to unite religion and science around a common purpose – the Lancet commission is currently involved in such a process.[3] [4] However, it may be the case that like religion and science; traditional and allopathic medicine can live happily side by side within the same community and within the same individual. Whether and how they can really be brought together in a structural / organisational sense, for example in the same institution or within the same reimbursement system, is a matter for analysis and exploration. One thing I am sure of is that policy should not be made as though this were a technical issue and without considering the very different world views that lie behind each type of provision. Maybe the best that can be accomplished is for the systems to become more aware of each other and cross-refer when necessary, but to continue to make their own independent contributions?

— Richard Lilford, CLAHRC WM Director


  1. Rathod S, Pinninti N, Irfan M, Gorczynski P, Rathod P, Gega L, Naeem F. Mental Health Service Provision in Low- and Middle-Income Countries. Health Serv Insights. 2017; 10:
  2. Spray EC. Health and Medicine in the Enlightenment. Jackon M (ed). The Oxford Handbook of the History of Medicine. 2011.
  3. Horton R. When The Lancet went to the Vatican. Lancet. 2017; 389: 1500.
  4. Lee N, Remuzzi G, Horton R. The Vatican-Mario Negri-Lancet Commission on the value of life. Lancet. 2017; 390: 1573.

50 Year Anniversary of the First Human Heart Transplant: Lessons for Today

On 3 December we commemorated the 50 year anniversary of the world’s first heart transplant. The operation took place in the early hours of a Saturday morning at the Groote Schuur hospital in Cape Town, South Africa. Christiaan Barnard sutured Denise Darvall’s donated heart into the chest of the recipient, Louis Washkansky. Barnard restarted the new heart with an electric shock and then tried to wean the recipient off the heart and lung machine. But the new heart could not take the strain and Washkansky had to go back on the machine. The second attempt also failed, but when the heart and lung machine was turned off for the third time the recipient’s blood pressure started to climb. It kept on climbing, and soon Denise Darvall’s small heart had taken over the perfusion of Louis Washkansky’s large frame. Later that morning the world woke to the news of the world’s first heart transplant. Looking back over fifty years what should we make of Barnard’s achievement?

The transplant in an historical perspective

The two decades preceding the heart transplant have sometimes been referred to as the golden age of medical discovery.[1] The transplant can be ‘fitted’ retrospectively as the culmination of this golden age just as Neil Armstrong’s moon walk, two years later, can be seen as the crowning achievement of the space race. They belong to a number of technical achievements, including the first “test tube” baby and the first man in space, which are emblematic of human progress. They generate great public interest and media attention, but differ from more fundamental intellectual discoveries, such as the double helix in DNA or Higgs boson, that are rewarded with Nobel prizes.

The heart transplant in the ‘heroic’ medical age

In his book ‘One Life’ Barnard provides an interesting cameo of the power and autonomy of the medical profession in his time.[2] He recalls writing up the routine operation note that must follow any surgical procedure. The anaesthetist, ‘Oz’, suggested that Dr Jacobus Burger, the hospital superintendent, should be informed. Barnard asked whether he should wake him so early in the morning, but Oz replied that the night’s events warranted such an intrusion. At first the befuddled Dr Burger, aware if work in the animal lab, thought that he was being informed about another heart transplant in dogs. However, even when he learned that the transplant involved a human heart, he cryptically thanked the surgeon and replaced the receiver. Nowadays, the idea of carrying out a procedure of such novelty, cost and risk without formal sanction would be unfathomable. The vignette from the doctor’s tearoom vividly illustrates how the relationship between the medical profession and the broader society has changed over one generation. Rene Amalberti argues [3] that many professions progressed through a heroic age in the twentieth century before gradually becoming more formalised and regulated – aviation followed a similar trajectory following Charles Lindbergh’s dramatic flight across the Atlantic in 1927.

Gradually changing ethical norms

The ethics of heart transplants relate mainly to organ donation. In ‘One Life’ Barnard describes the tense atmosphere in the operating room as the team waited for the donor heart to stop after turning off Darvall’s ventilator. In fact, they did not wait, and Barnard’s brother Marius has stated he persuaded Christiaan to stop the donor heart by injecting a concentrated dose of potassium in order to give Washkansky the best chance of survival. Today two different doctors need to independently carry out tests to confirm the donor is brain stem dead before the heart can be removed, as opposed to waiting for death by the whole-body standard, i.e. when there is brain death and the heart has stopped beating.

Public views of heart transplants, then and now

Following the operation the exhausted Barnard went home for a sleep. In the afternoon he returned to the hospital where he was surprised to find his route obstructed by a large crowd of reporters. He had unleashed a tide of publicity and acclaim that resonated for many decades, but dissenting voices were also heard. Some, notably Malcolm Muggeridge, the editor of Punch magazine, attacked the operation on the basis of a near mystical reverence for the human heart and to this Barnard had a succinct response: “it’s merely a pump.” Others worried about the allocation of scarce resources to such a high-tech solution when people were dying from malnutrition and malaria. Defence of the procedure came, albeit years later, from the economics profession when it was shown that the operation has a highly favourable cost-to-benefit ratio (at least in a high-income country).[4] The procedure not only extends life by many years on average, but greatly improves the quality of that life. In fact, patients feel much better from the moment they regain consciousness after the operation despite pain from the sternotomy. The operation is now uncontroversial and is performed routinely in high-income countries. It was long predicted that a mechanical pump would supplant the need for transplantation. Mechanical hearts have improved,[5] but they are largely seen as a bridge to transplantation, rather than a better alternative.

If Christiaan Barnard had not performed his operation, heart transplants would have developed anyway (the second transplant was carried out independently by Adrian Kantrowitz in the USA on 6 December). I was a school boy with hopes of getting into medical school when Washkansky received his new heart. I was among the many millions who were swept up in the wonder of the event and it still stirs my imagination half a century later. And my family knows that I wish to donate my own heart if the circumstances arise.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Future Trends in NHS. NIHR CLAHRC West Midlands. 25 November 2016.
  2. Barnard C & Pepper CB. One Life. Toronto, Canada: Macmillan; 1969.
  3. Amalberti R. The paradoxes of almost totally safe transportation systems. Saf Sci. 2001; 37(2-3): 109-26.
  4. O’Brien BJ, Buxton MJ, Ferguson BA. Measuring the effectiveness of heart transplant programmes: Quality of life data and their relationship to survival analysis. J Chron Dis. 1987; 40(s1): s137-53.
  5. Girling AJ, Freeman G, Gordon JP, Poole-Wilson P, Scott DA, Lilford RJ. Modeling payback from research into the efficacy of left-ventricular assist devices as destination therapyInt J Technol Assess Health Care. 2007; 23(2): 269-77.

Is Research Productivity on the Decline Internationally?

I have written previously on the so-called ‘golden age of medical research,’ [1] which coincides roughly with the first two decades of my life – 1950-1970. The premise of a golden age entails the conclusion that it is followed by a less spectacular age where marginal returns are lower per unit of input – say per researcher. So, where does the truth lie – is research becoming ever more efficient, or is the productivity of research declining? This subject has been carefully examined by a number of scholars, most recently by Bloom and others.[2] First they looked at aggregate supply of researchers and economic output across the US economy, and they found a relationship that looks like this:

091 DCB Figure 1

So, productivity per researcher appears to decline with time and does so quite rapidly – the graph uses log scales. The drop in unit productivity has been fully compensated by growth in the number of researchers.

Given the obvious problems of studying this phenomenon at the aggregate level, the researchers turn to individual topics, such as number of transistors packed onto a single chip. It turns out that keeping Moore’s law going takes a rapidly increasing number of researchers. However, diminishing returns are not just observed in electronics, the authors found the same phenomenon in agriculture and medicine. Research productivity in the pharmaceutical industry is one-tenth of what it was in 1970, and mortality gains have peaked in cancer and in heart disease. To some extent one can see this effect in the number of authors of medical papers, such as those in genetic epidemiology – they often run literally into hundreds. It would appear that ideas really are getting harder to find and/or when found they portend smaller gains.

I have previously made the obvious point that improved care reduces the headroom for future improvements.[3] Of course, economic growth and further improvement in health still turn on new knowledge and technology without which the supply-side of the economy must stagnate. The phenomenal growth of some emerging economies has been possible because of the non-rivalrous nature of previous discoveries made elsewhere. But we need to continue to advance for all that advances are hard to make. One of these advances concerns making optimal use of existing knowledge, and that is where CLAHRCs come into their own – we trade in knowledge about knowledge.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Future Trends in NHS. NIHR CLAHRC West Midlands. 25 November 2016.
  2. Bloom N, Jones CI, Van Reenen J, Webb M. Are Ideas Getting Harder to Find? Centre for Economic Performance Discussion Paper No. 1496. 2017.
  3. Lilford RJ. Patient Involvement in Patient Safety: Null Result from a High Quality Study. NIHR CLAHRC West Midlands. 18 August 2017.

Machine Learning and the Demise of the Standard Clinical Trial!

An increasing proportion of evaluations are based on database studies. There are many good reasons for this. First, there simply is not enough capacity to do randomised comparisons of all possible treatment variables.[1] Second, some treatment variables, such as ovarian removal during hysterectomy, are directed by patient choice rather than experimental imperative.[2] Third, certain outcomes, especially those contingent on diagnostic tests,[3] are simply too rare to evaluate by randomised trial methodology. In such cases, it is appropriate to turn to database studies. And when conducting database studies it is becoming increasingly common to use machine learning rather than standard statistical methods, such as logistic regression. This article is concerned with strengths and limitations of machine learning when databases are used to look for evidence of effectiveness.

When conducting database studies, it is right and proper to adjust for confounders and look for interaction effects. However, there is always a risk that unknown or unmeasured confounders will result in residual selection bias. Note that two types of selection are in play:

  1. Selection into the study.
  2. Once in the study, selection into one arm of the study or another.

Here we argue that while machine learning has advantages over RCTs with respect to the former type of bias, it cannot (completely) solve the problem of selection to one type of treatment vs. another.

Selection into a Study and Induction Across Place and Time (External Validity)
A machine learning system based on accumulating data across a health system has advantages with respect to the representativeness of the sample and generalisations across time and space.

First, there are no exclusions by potential participant or clinician choice that can make the sample non-representative of the population as a whole. It is true that the selection is limited to people who have reached the point where their data become available (it cannot include people who did not seek care, for example), but this caveat aside, the problem of selection into the study is strongly mitigated. (There is also the problem of ‘survivor bias’, where people are ‘missing’ from the control group because they have died, become ineligible or withdrawn from care. We shall return to this issue.)
Second, the machine can track (any) change in treatment effect over time, thereby providing further information to aid induction. For example, as a higher proportion of patients/ clinicians adopt a new treatment, so intervention effect can be examined. Of course, the problem is not totally solved, because the possibility of different effects in other health systems (not included in the database) still exists.

Selection Once in a Study (Internal Validity)
However, the machine cannot do much about selection to intervention vs. control conditions (beyond, perhaps, enabling more confounding variables to be taken into account). This is because it cannot get around the cause-effect problem that randomisation neatly solves by ensuring that unknown variables are distributed at random (leaving only lack of precision to worry about). Thus, machine learning might create the impression that a new intervention is beneficial when it is not. If the new intervention has nasty side-effects or high costs, then many patients could end up getting treatment that does more harm than good, or which fails to maximise value for money. Stability of results across strata does not vitiate the concern.

It could be argued, however, that selection effects are likely to attenuate as the intervention is rolled out over an increasing proportion of the population. Let us try a thought experiment. Consider the finding that accident victims who receive a transfusion have worse outcomes than those who do not, even after risk-adjustment. Is this because transfusion is harmful, or because clinicians can spot those who need transfusion, net of variables captured in statistical models? Let us now suppose that, in response to the findings, clinicians subsequently reduce use of transfusion. It is then possible that changes in the control rate and in the treatment effect can provide evidence for or against cause and effect explanations. The problem here is that bias may change as the proportions receiving one treatment or the other changes. There are thus two possible explanations for any set of results – a change in bias or a change in effectiveness, as a wider range of patients/ clinicians receive the experimental intervention. It is difficult to come up with a convincing way to resolve the cause and effect problem. I must leave it to someone cleverer than myself to devise a theorem that might shed at least some light on the plausibility of the competing explanations – bias vs. cause and effect. But I am pessimistic for this general reason. As a treatment is rolled out (because it seems effective) or withdrawn (because it seems ineffective or harmful), so the beneficial or harmful effect (even in relative risk ratio terms) is likely to attenuate. But the bias is also likely to attenuate because less selection is taking place. Thus the two competing explanations may be confounded.

There is also the question of whether database studies can mitigate ‘survivor bias’. When the process (of machine learning) starts, then survivor bias may exist. But, by tracking estimated treatment effect over time, the machine can recognise all subsequent ‘eligible’ cases as they arise. This means that the problem of survivor bias should be progressively mitigated over time?

So what do I recommend? Three suggestions:

  1. Use machine learning to provide a clue to things that you might not have suspected or thought of as high priority for a trial.
  2. Nest RCTs within database studies, so that cause and effect can be established at least under specified circumstances, and then compare the results with what you would have concluded by machine learning alone.
  3. Use machine learning on an open-ended basis with no fixed stopping point or stopping rule, and make data available regularly to mitigate the risk of over-interpreting a random high. This approach is very different to the standard ‘trial’ with a fixed starting and end data, data-monitoring committees,[4] ‘data-lock’, and all manner of highly standardised procedures. Likewise, it is different to resource heavy statistical analysis, which must be done sparingly. Perhaps that is the real point – machine learning is inexpensive (has low marginal costs) once an ongoing database has been established, and so we can take a ‘working approach’, rather than a ‘fixed time point’ approach to analysis.

— Richard Lilford, CLAHRC WM Director


    1. Lilford RJ. The End of the Hegemony of Randomised Trials. 30 Nov 2012. [Online].
    2. Mytton J, Evison F, Chilton PJ, Lilford RJ. Removal of all ovarian tissue versus conserving ovarian tissue at time of hysterectomy in premenopausal patients with benign disease: study using routine data and data linkageBMJ. 2017; 356: j372.
    3. De Bono M, Fawdry RDS, Lilford RJ. Size of trials for evaluation of antenatal tests of fetal wellbeing in high risk pregnancy. J Perinat Med. 1990; 18(2): 77-87.
    4. Lilford RJ, Braunholtz D, Edwards S, Stevens A. Monitoring clinical trials—interim data should be publicly available. BMJ. 2001; 323: 441


Context is Everything in Service Delivery Research

I commonly come across colleagues who say that context is all in service delivery research. They argue that summative quantitative studies are not informative because there is so much variation by context that the average is meaningless. I think that this is lazy thinking. If context was all, then there would be no point in studying anything by any means; any one instance would be precisely that – one instance. If the effects of an intervention were entirely context-specific then it would never be permissible to extrapolate from one situation to another, irrespective of the types of observations made. But nobody thinks that.

A softer anti-quantitative view accepts the idea of generalising across contexts, but holds that such generalisations / extrapolations can be built up solely from studies of underlying mechanisms, and that in-depth qualitative studies can tell us all we need to know about those mechanisms. Proponents of this view hold that quantitative epidemiological studies are, at best, extremely limited in what they can offer. It is true that some things cannot easily be studied in a quantitative comparative way – an historian interested in the cause of the First World War cannot easily compare the candidate explanatory variables over lots of instances. In such a case, exploration of various individual factors that may have combined to unleash the catastrophe may be all that is available. But accepting this necessity is not tantamount to eschewing quantitative comparisons when they are possible. It is unsatisfying to study just the mechanisms by which improved nurse ratios may reduce falls or pressure ulcers without measuring whether the incidence of these outcomes is, in fact, correlated with nurse numbers.

Of course, concluding that quantification is important is not tantamount to concluding that quantification alone is adequate. It never is and cannot be, as the famous statistician, Sir Austin Bradford Hill, implied in his famous speech.[1] Putative causal explanations are generally strengthened when theory generated from one study yields an hypothesis that is supported by another study (Hegel’s thesis, antithesis, synthesis idea). Alternatively, or in addition, situations arise when evidence for a theory, and for hypotheses that are contingent on that theory, may arise within the framework of a single study. This can happen when observations are made across a causal chain. For example, a single study may follow up heavy, light and non-drinkers and examine the size of the memory centre in the brain (by MRI) and their memory (through a cognitive test).[2] The theory that alcohol affects memory is supported by the finding that memory declines faster in drinkers than teetotallers, and yet further support comes from alcohol’s effect on the size of the memory centre (the hippocampus). Similarly, a single study may show that improving the nurse to patient ratio results in a lower probability of unexpected deaths and more diligent monitoring of patients’ vital signs. Here the primary hypothesis that the explanatory variable (nurse/patient ratio) is correlated with the outcome variable (unexpected hospital death) is reinforced by also finding a correlation between the intervening / mediatory variable (diligence in monitoring vital signs) and the outcome variable (hospital deaths) (see Figure 1). In a previous News Blog we have extolled the virtues of Bayesian networking in quantifying these contingent relationships.[3]

088 DCB - Context Fig 1

Figure 1: Causal chain linking explanatory variable (intervention) and outcome

Observations relating to various primary and higher order hypotheses may be quantitative or qualitative. Qualitative observations on their own are seldom sufficient to test a theory and make reliable predictions. But measurement without a search for mechanisms – without representation / theory building – is effete. The practical value of science depends on ‘induction’ – making predictions over time and space. Such predictions across contexts require judgement, and such judgement cannot gain purchase without an understanding of how an intervention might work. Putting these thoughts together (the thesis, antithesis, synthesis idea and the need for induction), we end up with a ‘realist’ epistemology – the idea here is to make careful observations, interpret them according to the scientific canon, and then represent the theory – the underlying causal mechanisms. In such a framework, qualitative observations complement quantitative observations and vice-versa.

It is because results are sensitive to context that mechanistic / theoretical understanding is necessary. Context refers to things that vary from place to place and that might influence the (relative or absolute) effects of an intervention. It is also plausible to argue that context is more influential with respect to some types of intervention than others. Arguably, context is (even) more important in service delivery research than in clinical research. In that case, one might say that understanding mechanisms is even more important in service delivery research than in clinical research. At the (absolute) limit, if B always follows A, then sound predictions may be made in the absence of an understanding of mechanisms – the Sun was known to always come up in the East, even before rotation of the Earth was discovered. But scientific understanding requires more than just following the numbers. A chicken may be too quick to predict that a meal will follow egg-laying just because that has happened on 364 consecutive days, while failing to appreciate the underlying socioeconomic mechanisms that might land her on a dinner plate on the 365th day, in Bertrand Russell’s evocative example.[4]

Moving on from a purely epistemological argument, there is plenty of empirical data to show that many quantitative findings are replicated across a sufficient range of contexts to provide a useful guide to action. Here are some examples. The effect of ‘user fees’ and co-payments on consumption of health care are quite predictable – demand is inelastic on price, meaning that a relatively small increase in price, relative to average incomes, suppresses demand. Moreover, this applies irrespective of medical need,[5] and across low- and high-income countries.[6] Audit and feedback as a mechanism to improve the effectiveness of care has consistently positive, but small (about 8% change in relative risk) effects.[7] Self-care for diabetes is effective across many contexts.[8] Placing managers under risk of sanction has a high risk of inducing perverse behaviour when managers do not believe they can influence the outcome.[9] It is sometimes claimed that behavioural / organisational sciences are qualitatively distinct from natural sciences because they involve humans, and humans have volition. Quite apart from the fact that we are not the only animals with volition (we share this feature with other primates and cetaceans), the existence of self-determination does not mean that interventions will not have typical / average effects across groups or sub-groups of people.

The diabetes example, cited above, is particularly instructive because it makes the point that the role of context is amenable to quantitative evaluation – context may have no effect, it may modify an effect (but not vitiate it), it may obliterate an effect, or even reverse the direction of an effect. Tricco’s iconic diabetes study [8] combined over 120 RCTs of service interventions to improve diabetes care (there are now many more studies and the review is being updated). The study shows not just how the effect of interventions vary by intervention type, but also how the intervention effect itself varies by context. It is thus untenable to claim, as some do, that ‘what works for whom, under what circumstances’ is discernible only by qualitative methods.[10] The development economist, Abhijit Banerjee, goes further, arguing that the main purpose of RCTs is to generate unbiased point estimates of effectiveness for use in observational studies of the moderating effect of context on intervention effects.[11]

We have defined context as all the things that might vary from place to place and that might affect intervention effects. Some people conflate context with how an intervention is taken up / modified in a system. This is a conceptual error – how the intervention is applied in a system is an effect of the intervention and like other effects, it may be influenced by context. Likewise, everything that happens ‘downstream’ of an intervention as a result of the intervention is a potential effect, and again, this effect may be affected by context.[12] Context includes upstream variables (see Figure 2) and any downstream variable at baseline. All that having been said, it is not always easy to distinguish when a change in a downstream variable is caused by the intervention, or whether it is a change in a variable that would have happened anyway (i.e. a temporal effect). Note, that a variable such as the nurse-patient ratio may be an intervention in one study (e.g. a study of nurse-patient ratios) and a context variable in another (e.g. a study of an educational intervention to reduce falls in hospital). Context is defined by its role in the inferential cause / effect framework, not by the kind of variable it is.

088 DCB - Context Fig 2

Figure 2: How to conceptualise the intervention, the effects downstream, and the context.

— Richard Lilford, CLAHRC WM Director


  1. Hill AB. The environment and disease: Association or causation? Proc R Soc Med. 1965; 58(5): 295-300.
  2. Topiwala A, Allan C, Valkanova V, et al. Moderate alcohol consumption as risk factor for adverse brain outcomes and cognitive decline: longitudinal cohort studyBMJ. 2017; 357:j2353.
  3. Lilford RJ. Statistics is Far Too Important to Leave to Statisticians. NIHR CLAHRC West Midlands News Blog. 27 June 2014.
  4. Russell B. Chapter VI. On Induction. In: Problems of Philosophy. New York, NY: Henry Holt and Company, 1912.
  5. Watson SI, Wroe EB, Dunbar EL, Mukherjee J, Squire SB, Nazimera L, Dullie L, Lilford RJ. The impact of user fees on health services utilization and infectious disease diagnoses in Neno District, Malawi: a longitudinal, quasi-experimental study. BMC Health Serv Res. 2016; 16(1): 595.
  6. Lagarde M & Palmer N. The impact of user fees on health service utilization in low- and middle-income countries: how strong is the evidence? Bull World Health Organ. 2008; 86(11): 839-48.
  7. Effective Practice and Organisation of Care (EPOC). EPOC Resources for review authors. Oslo: Norwegian Knowledge Centre for the Health Services; 2015.
  8. Tricco AC, Ivers NM, Grimshaw JM, Moher D, Turner L, Galipeau J, et al. Effectiveness of quality improvement strategies on the management of diabetes: a systematic review and meta-analysis. Lancet. 2012; 379: 2252–61.
  9. Lilford RJ. Discontinuities in Data – a Neat Statistical Method to Detect Distorted Reporting in Response to Incentives. NIHR CLAHRC West Midlands News Blog. 1 September 2017.
  10. Pawson R & Tilley N. Realistic Evaluation. London: Sage. 1997.
  11. Banerjee AV & Duflo E. The Economic Lives of the Poor. J Econ Perspect. 2007; 21(1): 141-67.
  12. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

Towards a Unifying Theory for the Development of Health and Social Services as the Economy Develops in Countries

In a previous news blog I proposed grassroots solutions to the transportation of critically ill patients to hospital.[1] Other work has demonstrated the effectiveness of community action groups in many contexts, such as maternity care.[2] More recently I have read that the Kenyan government is proposing a combination of local authority and community action (Water Sector Trust Fund) to improve water and sewage in urban settlements.[3] The idea is for the local authority to provide the basic pipe infrastructure and then for local communities to establish linkages to bring water and sewage into homes. The government does not merely lay pipes, but also stimulates local involvement, including local subsidies and micro-enterprises. This epitomises collaboration between authorities and community groups.

In an extremely poor, post-conflict country, such as South Sudan, it is hard to find activities where the authorities and local people work together to improve health and wellbeing. On the other hand, in extremely rich countries like Norway and Switzerland, the government provides almost all that is required; all the citizen has to do is walk into the bathroom and turn on the tap.

The idea that is provoked by these many observations is that different solutions suit different countries at different points in their development. So much so obvious. Elaboration of the idea would go something like this. When a country is at the bottom end of the distribution for wealth, there is very little to be done other than put the basics of governance and law and order into place and try to reduce corruption. Once the country becomes more organised and slightly better off, a mixture of bottom-up and top-down solutions should be implemented. At this point, the tax base is simply too small for totally top-down, Norwegian style, solutions. In effect the bottom-up contribution makes good the tax deficit – it is a type of local and voluntary taxation. As the economy grows and as the middle class expands, the tax base increases and the government can take a larger role in funding and procuring (or providing) comprehensive services for its citizens.

This might seem anodyne written down as above. However, it is important to bear in mind that harm can be done by making the excellent the enemy of the good. Even before a substantial middle-class evolves in society, wealth is being generated. I recently visited a number of urban settlements (slums) in Nigeria, Pakistan and Kenya. All of these places were a hive of economic activity. This activity was mostly in the informal sector, generating small surpluses. Such wealth is invisible to the tax person, but it is there, and can be used. Using it requires organisation: “grit in the oyster”. The science base on how best to provide this ‘grit’ is gradually maturing. In order for it to do so, studies must be carried out across various types of community engagement and support. I expect this to be a maturing field of inquiry to which the global expansion of the CLAHRC message can contribute. Members of our CLAHRC WM team are engaging in such work through NIHR-funded programmes on health services and global surgery, and we hope to do so with regard to water and sanitation in the future.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Transport to Place of Care. NIHR CLAHRC West Midlands News Blog. 29 September 2017.
  2. Lilford RJ. Lay Community Health Workers. NIHR CLAHRC West Midlands News Blog. 10 April 2015.
  3. Water Sector Trust Fund, GIZ. Up-scaling Basic Sanitation for the Urban Poor (UBSUP) in Kenya. 2017.

Transport to Place of Care

Availability of emergency transport is taken for granted in high-income countries. The debate in such countries relates to such matters as the marginal advantages of helicopters over vehicle ambulances, and what to do when the emergency team arrives at the scene of an accident. But in low- or low-middle-income countries, the situation is very different – in Malawi, for example, there is no pretence that a comprehensive ambulance system exists. The subject of transport does not seem to get attention commensurate with its importance. Researchers love to study the easy stuff – role of particulates in lung disease; prevalence of diabetes in urban vs. rural areas; effectiveness of vaccines. But study selection should not depend solely on tractability – the scientific spotlight should also encompass topics that are more difficult to pin down, but which are critically important. Transport of critically ill patients falls into this category.[1]

Time is of the essence for many conditions. Maternity care is an archetypal example,[2] where delayed treatment in conditions such as placental abruption, eclampsia, ruptured uterus, and obstructed labour can be fatal for mother and child. The same applies to acute infections (most notably meningococcal meningitis) and trauma where time is critical (even if there is no abrupt cut-off following the so called ‘golden hour’).[3] The outcome for many surgical conditions is affected by delay during which, by way of example, an infected viscus may rupture, an incarcerated hernia may become gangrenous, or a patient with a ruptured tubal pregnancy might exsanguinate. However, in many low-income countries less than one patient in fifty has access to an ambulance service.[4] What is to be done?

The subject has been reviewed by Wilson and colleagues in a maternity care context.[5] Their review revealed a number of papers based on qualitative research. They find the theory that one might have anticipated – long delays, lack of infrastructure, and so on. They also make some less intuitive findings. People think that having an emergency vehicle at the ready could bring bad luck, and that it is shameful to expose oneself when experiencing vaginal bleeding.

Quite a lot of work has been done on the use of satellites to develop isochrones based on distances,[6] gradients, and road provision. But working out how long it should take to reach a hospital does not say much about how long it takes in the absence of a service for the transport of acutely sick patients.

We start from the premise that, for the time being at least, a fully-fledged ambulance service is beyond the affordability threshold for many low-income countries. However, we note that many people make it to hospital in an emergency even when no ambulance is available. This finding makes one think of ‘grass-roots’ solutions; finding ways to release the capacity inherent in communities in order to provide more rapid transfers. An interesting finding in Wilson’s paper is that few people, even very poor people, could not find the money for transfer to a place of care in a dire emergency. However, this does not square with work on acutely ill children in Malawi (Nicola Desmond, personal communication), nor work done by CLAHRC WM researchers showing the large effects that user fees have in supressing demand, especially for children, in the Neno province of Malawi.[7] In any event, a grass roots solution should be sought, pending the day when all injured or acutely ill people have access to an ambulance. Possible solutions include community risk-sharing schemes, incentives to promote local enterprises to transport sick people, and automatic credit transfer arrangements to reimburse those who provide emergency transport.

I am leading a work package for the NIHR Global Surgery Unit, based at the University of Birmingham, concerned with access to care. We will describe current practice across purposively sampled countries, work with local people to design a ‘solution’, conduct geographical and cost-benefit analyses, and then work with decision-makers to implement affordable and acceptable improvement programmes. These are likely to involve a system of local risk-sharing (community insurance), IT facilitated transfer of funds, promotion of local transport enterprises, community engagement, and awareness raising. We are very keen to collaborate with others who may be planning work on this important topic.

— Richard Lilford, CLAHRC WM Director


  1. United Nations. The Millennium Development Goals Report 2007. New York: United Nations; 2007.
  2. Forster G, Simfukew V, Barber C. Use of intermediate mode of transport for patient transport: a literature review contrasted with the findings of Transaid Bicycle Ambulance project in Eastern Zambia. London: Transaid; 2009.
  3. Lord JM, Midwinter MJ, Chen Y-F, Belli A, Brohi K, Kovacs EJ, Koenderman L, Kubes P, Lilford RJ. The systemic immune response to trauma: an overview of pathophysiology and treatment. Lancet. 2014; 384(9952): 1455-65.
  4. Nyamandi V, Zibengwa E. Mobility and Health. 2007. In: Wilson A, Hillman S, Rosato M, Costello A, Hussein J, MacArthur C, Coomarasamy A. A systematic review and thematic synthesis of qualitative studies on maternal emergency transport in low- and middle-income countries. Int J Gynaecol Obstet. 2013; 122(3): 192-201.
  5. Wilson A, Hillman S, Rosato M, Skelton J, Costello A, Hussein J, MacArthur C, Coomarasamy A. A systematic review and thematic synthesis of qualitative studies on maternal emergency transport in low- and middle-income countries. Int J Gynaecol Obstet. 2013; 122(3): 192-201.
  6. Frew R, Higgs G, Harding J, Langford M. Investigating geospatial data usability from a health geography perspective using sensitivity analysis: The example of potential accessibility to primary healthcare. J Transp Health 2017 (In Press).
  7. Watson SI, Wroe EB, Dunbar EL, Mukherjee J, Squire SB, Nazimera L, Dullie L, Lilford RJ. The impact of user fees on health services utilization and infectious disease diagnoses in Neno District, Malawi: a longitudinal, quasi-experimental study. BMC Health Serv Res. 2016; 16(1): 595.

Stop Being Beastly to Malthus!

I never understand why people think that Malthus got it so badly wrong. His argument (the Malthusian trap) was that resources are finite and that, therefore, there must be some limit to the number of people that the world can feed.[1] While it certainly turned out that the world can feed many more people than he thought, this does not disprove the underlying theorem. At some point there must come a threshold, where food supply really fails to meet the demand. If we generalise from food to include water, then that point might not be as far away as complacent people think. Of course, we also have to take into account the environmental damage associated with feeding, transporting, and keeping a large number of people warm.

Malthus has become almost a figure of derision. While he may have been wrong about when, the jury is still out about whether. He was right about the generic point, that there is a limit to the carrying capacity of our planet. Food is central to this, because even if we do not run out of food, much environmental damage is caused in its production.

The world’s population will stabilise in about 50 years, although African populations will continue to expand for a while longer.[2] So we should mitigate the environmental effects of food production. I like to eat beef from time to time. However the production of beef is very energy intensive and the methane released by cattle contributes about 20% of the total global warming.[3] So I favour a tax on all beef, similar to that on fuel. Such a tax is more justifiable even, then a tax on sugar and tobacco. This is because consumption of sugar and tobacco does not have the strong externalities associated with fossil fuels and production of beef. There is no proper libertarian argument against taxation in circumstances where strong externalities apply.[4] Pigovian taxes are taxes designed to compensate for externalities and to reduce behaviour that harms others; they would seem entirely justified in this case. I am less of a fan of Pigovian taxes to deal with internalities – that is to stop people from harming themselves. But as it turns out, red meat is bad for our health, as discussed in a recent news blog.[5]

So let us give Malthus his due. He might have got the detail wrong, but his principle still stands. I vote for the rehabilitation of Malthus.

— Richard Lilford, CLAHRC WM Director


  1. Malthus TR. An Essay on the Principle of Population. London, UK: J. Johnson, 1798.
  2. Lilford RJ. The Population of the World – Will Depend on What Happens in Africa. NIHR CLAHRC West Midlands News Blog. 9 January 2015.
  3. Steinfeld H, Gerber P, Wassenaar T, Castel V, Rosales M, de Hann C. Livestock’s Long Shadow: Environmental Issues and Options. Rome, Italy: Food and Agriculture Organization, 2006.
  4. Lilford RJ. An Issue of BMJ with Multiple Studies on Diet. NIHR CLAHRC West Midlands News Blog. 4 August 2017.
  5. Capewell S, Lilford R. Are nanny states healthier states? BMJ. 2016; 355: i6341.