Tag Archives: Director & Co-Directors’ Blog

Engaging with Engagement

Engagement is easy. We are in a fortunate position in CLAHRC West Midlands that there is seemingly a long queue of people keen to talk to us about interesting and exciting health and social care projects. However, there is little point in engagement for engagement’s sake: our resources are too scarce to invest in projects or relationships with little or no return, and so meaningful engagement is much harder.

In putting together our application for hosting an Applied Research Collaboration we were faced with our perennial challenge of who to engage with and how. To do so we began to map our networks (see figure) and quickly realised even the number of NHS organisations (71) was too broad for us to work across in depth, never mind the wide range of academic, social care, voluntary sector and industry partners in the wider landscape beyond.

Our approach has been to work with partners who are keen to work with us; we make no apology for being a coalition of the willing. However, we have worked purposefully to ensure reach across all sectors, actively seeking out collaborators with whom we have had more limited interactions with, but who we know can help deliver the reach we require for research and implementation. For instance, we have one of the best performing and most forward thinking ambulance services in the country, with paramedics working at the very interface between physical and mental health, social care and emergency medicine. Given that we know some of these problems are best addressed upstream, the ambulance service gives us the opportunity to head closer to where the river rises than ever before.

119 GB - Figure 1

[1] Based in 2013/14 figures from RAWM
[2] Department of Business Enterprise Innovation and Skills, Business Population Estimates

In addition to this, we seek to use overarching bodies to help reach across spaces which are too diffuse and fragmented to allow us to access (such as the voluntary, charitable and third sectors). Even using these we will have to be selective from the 21 which exist when we seek to engage with voluntary groups (for example around priority setting, Public and Community Involvement Engagement and Participation, or co-production). Elsewhere, we utilise networks of networks, for example collaborating with the Membership Innovation Councils of the West Midlands Academic Health Science Network which draw in representatives from a wide cross section of organisations and professions who can then transmit our message to their respective organisations and local networks. Our experience tells us these vicarious contacts can often deliver some of the most useful engagement opportunities.

Finally, we have always been committed within CLAHRC to cross-site working and having our researchers and staff embedded as much as possible within healthcare organisations. This is in part to ensure our research remains grounded within the ‘real world’ of service delivery, rather than the dreaming spires (or concrete and glass tower blocks) of academia. However, we know that regardless of how well you plan and construct your network, some of the best ideas come about through chance encounters and corridor conversations. Nobel prize-winning economist Elinor Ostrom, much beloved by the CLAHRC WM team, elegantly described the value of ‘cheap talk’ in relation to collectively owned resources.[3] The visibility of our team can often prompt a brief exchange to rule in or out an idea for CLAHRC where a formal contact or approach might not have been made, making our ‘cheap talk’ meaningful through its context. Perhaps this is how we should see ourselves in CLAHRC West Midlands; as a finite but shared resource to the health and social care organisations within our region.

— Paul Bird, Head of Programmes (engagement)


  1. RAWM. The West Midlands Voluntary and Community Sector. 2015.
  2. Rhodes C. Business Statistics. Briefing Paper No. 06152. 2018.
  3. Ostrom E. Beyond Markets and States: Polycentric Governance of Complex Economic Systems. Am Econ Rev. 2010; 100(3): 641-72.

Demand-Led Research: A Taxonomy

We have previously discussed research where a service manager decides that an intervention should be studied prospectively. We have made the point that applied research centres, such as CLAHRCs/ARCs, should be responsive to requests for such prospective evaluation. Indeed, the request or suggestion from a service manager to evaluate their intervention provides a rich opportunity for scientific discovery since the intervention is a charge to the service not to the research funder. In some cases many of the outcomes of interest may be collected from routine data systems. In such circumstances the research can be carried out at a fraction of the usual cost for prospective evaluations. Nor should it be assumed that research quality must suffer. We give two examples below where randomised designs were possible, one where individual staff members were randomised to different methods to encourage uptake of seasonal influenza vaccine and the other where a step wedge cluster design was used to evaluate roll out of a community health worker programme across a heath facility catchment area in Malawi. Data from these studies has been collected and is being analysed.

(1) Improvement Project Around Staffs’ Influenza Vaccine Uptake [1]
At the time of this study staff at University Hospitals Birmingham NHS Foundation Trust were invited to take up the Influenza vaccination every September, and then reminded regularly. This study involved staff being sent one of four randomised letters to see if it would directly influence vaccination uptake. One factor of the letters emphasised being invited by an authority figure; the other factor emphasised vaccination rates in peer hospitals.

(2) Evaluating the impact of a CHW programme… in Malawi [2]
This study estimated the effect a CHW programme had on a number of health outcomes, including retention in care for patients with chronic non-communicable diseases, and uptake of women’s health services. Eleven health centres / hospitals were arranged into six clusters, which were then randomised to receive the intervention programme at various, staggered points. Each cluster crossed over from being a control group to an intervention group until all received the intervention.

In previous articles [3][4] we have examined the practical problems that can be encountered in obtaining ethical approvals and registering demand-led studies. These problems arise because of the implicit assumption that researchers, not service managers, are responsible for the interventions that are the subject of study. In particular we have criticised the Ottawa declaration on the ethics of cluster studies for making this assumption. We have pointed out the harm that rigid adherence to the tenets of this declaration could do by limiting the value that society could reap from evaluations of the large number of natural experiments that are all around us.

However, demand-led research is not homogeneous and so the demands on service manager and researcher vary from case to case. The purpose of this news blog article is to attempt a taxonomy of demand-led research. Since we are unlikely to get this right on our first attempt, we invite readers to comment further.

We discern two dimensions along which demand-led research may vary. First, the urgency dimension and second a dimension to describe the extent, if any, to which the researcher may have participated in the design of the intervention.

As a general rule, demand-led research is done under pressure of time. If there was no time pressure, then the research could be commissioned in the usual way through organisations such as the NIHR Service Delivery and Organisation Programme and the US Agency for Health Quality Research. Demand-led research is done under shorter lead times that are incompatible with the lengthy research cycle. However, permissible lead times for demand-led research vary from virtually no time to many months. In both of the studies above the possibility of the research was mooted only four or five months before roll-out of the index intervention was scheduled. We had to ‘scramble’ to develop protocols, obtain ethics approvals, and register the studies, as required for an experimental design, before roll-out ensued.

The second manner in which demand-led research may vary is in the extent of researcher involvement in design of the intervention itself. If the intervention is designed solely by the researcher or is co-produced, but under the researcher initiative, then this cannot be classified as demand-led. However, the intervention may be designed entirely by the service provider or it may be initiated by the service provider but with some input from the researcher. The vaccination intervention described in the box was initiated by the service who wished to include an incentive as part of a package of measures but they sought advice over the nature of the incentive from behavioural economists in our CLAHRC. On the other hand the intervention to train and deploy community health workers in Malawi was designed entirely by the service team with no input from the evaluation team whatsoever.

Contribution to research dominates because if the researcher makes no contribution to the intervention, then the researcher has little or no responsibility – full argument provided elsewhere.[4]

— Richard Lilford, CLAHRC WM Director


  1. Lilford R, Schmidtke KA, Vlaev I, et al. Improvement Project Around Staffs’ Influenza Vaccine Uptake. Clinicaltrials.gov. NCT03637036. 2018.
  2. Dunbar EL, Wroe EB, Nhlema B, et al. Evaluating the impact of a community health worker programme on non-communicable disease, malnutrition, tuberculosis, family planning and antenatal care in Neno, Malawi: protocol for a stepped-wedge, cluster randomised controlled trial. BMJ Open. 2018; 8(7): e019473.
  3. Lilford RJ. Demand-Led Research. NIHR CLAHRC West Midlands News Blog. 18 January 2019.
  4. Watson S, Dixon-Woods M, Taylor CA, Wore EB, Dunbar EL, Chilton PJ, Lilford RJ. Revising ethical guidance for the evaluation of programmes and interventions not initiated by researchers. J Med Ethics. [In Press].

When Randomisation is not Enough: Masking for Efficacy Trials of Skin Disease, Ulcers and Wound Infections

In a previous News Blog [1] we discussed endpoint measurement for trials of wound infection, where the observers were not ‘blinded’ (not masked to the intervention group). Such an approach is simply not adequate, even if the observers use ‘strict criteria’.[1] This is because of subjectivity in the interpretation of the criteria and, more especially, because of reactivity. Reactivity means that observers are influenced, albeit sub-consciously, by knowledge of the group to which patients have been assigned (treatment or not). Such reactivity is an important source of bias in science.[2]

We are proposing a trial of a promising treatment for recurrent leprosy ulcers that we would like to carry out in the Leprosy Mission Hospital in Kathmandu, Nepal. We plan to conduct an efficacy trial of a regenerative medicine (RM) technique where a paste is made from the buffy coat layer of the patient’s own blood. This is applied to the ulcer surface at the time of dressing change. The only difference in treatment will be whether or not the RM technique is applied when the regular change of wet dressing is scheduled. We will measure, amongst other things, the rate of healing on the ulcers and time to complete healing and discharge from hospital.

Patients will be randomised so as to avoid selection bias and, as the primary endpoints in this efficacy trial are measured during the hospital sojourn (and patients seldom discharge themselves), we are mainly concerned with outcome bias as far as endpoints regarding ulcer size are concerned.

One obvious way to get around the problem of reactivity is to use a well described method in which truly masked observers, typically based off-site, measure ulcer size using photographs. Measurements are based on a sterile metal ruler positioned at the level of the ulcer to standardise the measurement irrespective of the distance of the camera. The measurement can be done manually or automated by computer (or both). But is that enough? It has been argued that bias can still arise, not at the stage where photographs are analysed, but rather at the earlier stage of photograph acquisition. This argument holds that, again perhaps sub-consciously, those responsible for taking the photograph can affect its appearance. The question of blinding / masking of medical images is a long-standing topic of debate.

The ‘gold standard’ method is to have an independent observer arrive on the scene at the appropriate time to make the observations (and take any photographs). Such a method would be expensive (and logistically challenging over long distances). So, an alternative would be to deploy such an observer for a random sub-set of cases. This method may work but it has certain disadvantages. First, it would be tricky to choreograph as it would disrupt the work flow in settings such as that described above. Second, to act as a method of audit, it would need to be used alongside the existing method (making the method still more ‘unwieldy’). Third, the method of preparing the wound would still lie in the hands of the clinical team, and arguably still be subject to some sort of subconscious ‘manipulation’ (unless the observer also provided the clinical care). Fourth, given that agreement would not be exact between observers, a threshold would have to be agreed regarding the magnitude of difference between the standard method and the monitoring method that would be regarded as problematic. Fifth, it would not be clear how to proceed if such a threshold was crossed. While none of these problems are necessarily insurmountable, they are sufficiently problematic to invite consideration of further methods. What might augment or replace standard third party analysis of photographic material?

Here we draw our inspiration from a trial of surgical technique in the field of ophthalmology/orbital surgery.[3] In this trial, surgical operations were video-taped in both the intervention and control groups. With permission of patients, we are considering such an approach in our proposed trial. The vast majority of ulcers are on the lower extremities, so patients’ faces would not appear in the videos. The videos could be arranged so that staff were not individually identifiable, though they could be redacted if and where necessary. We would like to try to develop a method whereby the photographs were directed in real time by remote video link, but pending the establishment of such a link, we propose that each procedure (dressing change) is video-taped, adhering to certain guidelines (for example, shot in high-definition, moving the camera to give a full view of the limb from all sides, adequate lighting, a measurement instrument is included in the shot, etc.). We propose that measurements are made both in the usual way (from mobile phone photographs), and from ‘stills’ obtained from the video-tapes. Each could be scored by two independent, off-site observers. Furthermore the videos could be used as a method of ‘ethnographic’ analysis of the process to surface any material differences between patients in each trial arm in lighting, preparation of ulcer sites, time spent on various stages of the procedure and photograph acquisition, and so on.

Would this solve the problem? After all, local clinicians would still prepare the ulcer site for re-bandaging and, insofar as they may be able to subconsciously manipulate the situation, this risk has not been vitiated. However, we hypothesise that the video will work a little like a black box on an aeroplane; it cannot stop things happening, but it provides a powerful method to unravel what did happen. The problem we believe we face is not deliberate maleficence, but subtle bias at the most. We think that by using the photographic approach, in accordance with guidelines for such an approach,[4] we already mitigate the risk of outcome measurement bias. We think that by introducing a further level of scrutiny, we reduce the risk of bias still further. Can the particular risk we describe here be reduced to zero? We think not. Replication remains an important safeguard to the scientific endeavour. We now turn our attention to this further safeguard.

Leprosy ulcers are far from the only type of ulcer to which the regenerative medicine solution proposed here is relevant. Diabetic ulcers, in particular, are similar to leprosy ulcers in that loss of neural sensation plays a large part in both. We have argued elsewhere that much can be learned by comparing the results of the same treatment across different disease classes. In due course we hope to collaborate with those who care for other types of skin ulcer so that we can compare and contrast and also to advance methodologically. Together we will seek the optimal method to limit expense and disruption of workflow while minimising outcome bias from reactive measurements.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Before and After Study Shows Large Reductions in Surgical Site Infections Across Four African Countries. NIHR CLAHRC West Midlands News Blog. 10 August 2018.
  2. Kazdin AE. Unobtrusive measures in behavioral assessment. J Appl Behav Anal. 1979; 12: 713–24.
  3. Feldon SEScherer RWHooper FJ, et al. Surgical quality assurance in the Ischemic Optic Neuropathy Decompression Trial (IONDT)Control Clin Trials. 200324294-305.
  4. Bowen AC, Burns K, Tong SY, Andrews RM, Liddle R, O’Meara IM, et al. Standardising and assessing digital images for use in clinical trials: a practical, reproducible method that blinds the assessor to treatment allocation. PLoS One. 2014;9(11):e110395.

A Bigger Risk Than Climate Change?

There are many future risks to our planet. The risk that seems to cause most concern is climate change. I share these concerns. However, there are some risks which, if they materialised, would be even worse than those of climate change. An asteroid strike, such as the collision 66 million years ago, is in this category. But this is a most improbable event, essentially 0% in the next 50 years.[1] There is, however, a risk that would be absolutely catastrophic, and whose probability of occurring is not remote. I speak, of course, of nuclear strike.

There are two issues to consider: the degree of the catastrophe, and its probability of occurring. Regarding the extent of the catastrophe, one can refer to the website Nukemap. Here one finds evidence-based apocalyptic predictions. In order to make sense of these it is necessary to appreciate that nuclear bombs destroy human life in three zones radiating out from the epicentre: the fire ball; the shock wave; and the area of a residual radiation (whose direction depends on prevailing winds). If a relatively small atomic bomb, such as a 455 kiloton bomb from a nuclear submarine, landed on Glasgow, it would kill an estimated quarter of a million people and injure half a million, not taking into account radiation damage. The bomb released by the Soviets into the upper atmosphere at 50 megatons (Tsar Bomba) would, if it landed on London, kill over 4.5 million people and injure 3 million more (again not including the radiation damage that would most likely spread across northern Europe). Daugherty, Levi and Van Hippel calculate that deployment of only 1% of the world’s nuclear armaments would cause up to 56 million deaths and 61 million casualties in the book ‘The Medical Implications of Nuclear War‘.[2] Clearly, larger conflagrations pose an existential threat that could wipe out the whole of the northern hemisphere. When I look at my lovely grandchildren, sleeping in their beds at night, I sometimes think of that. And all of the above harms exclude indirect effects resulting from collapse of law and order, financial systems, supply chains, and so on.

So, nuclear war could be catastrophic, but to calculate the net expected burden of disease and disability we need to know the probability of its occurrence. The risk of nuclear strike must be seen as material. During, and immediately following, the Cold War there were at least three points at which missile strikes were imminent. They were all a matter of miscalculation. The most likely cause of nuclear war is a false positive signal of a strike, perhaps simulated by a terrorist group. These risks are increasing since at least eight countries now have nuclear weapons. The risk of a single incident, leading to the death of, say, 1 million people, might be as high as 50% over the next 50 years according to some models.[3] Another widely cited figure from Hellman is 2% per year.[4] The risk of an attack with retaliatory strikes, and hence over 50 million dead, would be lower – say 10% over the next 50 years. Identifying the risk of future events may seem quixotic, but not trying to do so is like the ostrich putting its head in the sand. Using slogans such as ‘alarmist’ is simply a way of avoiding uncomfortable thoughts better confronted. Let us say the risk of a strike with retaliation is indeed 10% over 50 years, and that 50 million causalities will result. If the average causality is 40 years of age, then the expected life years lost over 50 years would be about 200,000,000 (50m x 40 x 0.1). This is without discounting, but why would one discount these lives on the basis of current time-preferences?

Given the high expected loss of life (life years multiplied by probability), it seems that preventing nuclear war is up there with climate change. The effects of nuclear war are immediate and destroy infrastructure, while climate change provides plenty of warning and infrastructure can be preserved, even if at high cost. Avoiding nuclear war deserves no less attention. In 2014 the World Health Organization published a report that estimated that climate change would be responsible for 241,000 additional deaths in the year 2030, which is likely an underestimate as their model could not quantify a number of causal pathways, such as economic damage, or water scarcity.[5] But we have time to adapt and reduce this risk – nuclear war would be sudden and would disrupt coping mechanisms, leading to massive social and economic costs, along with large numbers of deaths and people diseased or maimed for life. Nuclear strike is public health enemy number one in my opinion. It is difficult to pursue the possible options to reduce this risk without entering the world of politics, so this must be pursued within the pages of your News Blog.

— Richard Lilford, CLAHRC WM Director


  1. Sentry: Earth Impact Monitoring. Impact Risk Data. 2018.
  2. Daugherty W, Levi B, Von Hippel F. Casualties Due to the Blast, Heat, and Radioactive Fallout from Various Hypothetical Nuclear Attacks on the United States. In: Solomon F & Marston RQ (eds.) The Medical Implications of Nuclear War. Washington, D.C.: National Academies Press (US); 1986.
  3. Barrett AM, Baum SD, Hostetler K. Analyzing and Reducing the Risks of Inadvertent Nuclear War Between the United States and Russia. Sci Glob Security. 2013; 21: 106-33.
  4. Hellman ME. Risk Analysis of Nuclear Deterrence. The Bent of Tau Beta Pi. 2008 : Spring: 14-22.
  5. World Health Organization. Quantitative risk assessment of the effects of climate change on selected causes of death, 2030s and 2050s. Geneva: World Health Organization, 2014.


Health Service and Delivery Research – a Subject of Multiple Meanings

Never has there been a topic so subject to lexicological ambiguity as that of Service Delivery Research. Many of the terms it uses are subject to multiple meanings, making communication devilishly difficult; a ‘Tower of Babel’ according to McKibbon, et al.[1] The result is that two people may disagree when they agree, or agree when they are fundamentally at odds. The subject is beset with ‘polysemy’(one word means different things) and, to an even greater extent, ‘cognitive synonyms’ (different words mean the same thing).

Take the very words “Service Delivery Research”. The study by McKibbon, et al. found 46 synonyms (or near synonyms) for the underlying construct, including applied health research, management research, T2 research, implementation research, quality improvement research, and patient safety research. Some people will make strong statements as to why one of these terms is not the same as another – they will tell you why implementation research is not the same as quality improvement, for example. But seldom will two protagonists agree and give the same explanation as to why they differ, and textual exegesis of the various definitions does not support separate meanings – they all tap into the same concept, some focussing on outcomes (quality, safety) and others on the means to achieve those outcomes (implementation, management).

Let us examine some widely used terms in more detail. Take first the term “implementation”. The term can mean two quite separate things:

  1. Implementation of the findings of clinical research (e.g. if a patient has a recent onset thrombotic stroke then administer a ‘clot busting’ medicine).
  2. Implementation of the findings from HS&DR (e.g. do not use incentives when the service providers targeted by the incentive do not believe they have any control over the target.[2][3]

Then there is my bête noire, “complex interventions”. This term concatenates separate ideas, such as the complexity of the intervention vs. the complexity of the system (e.g. health system) with which the intervention interacts. Alternatively, it may concatenate the complexity of the intervention components vs. the number of components it includes.

It is common to distinguish between process and outcome, á la Donabedian.[4] But this conflates two very different things – clinical process (such as prescribing the correct medicine, eliciting the relevant symptoms, or displaying appropriate affect), and service level (upstream) process endpoints (such as favourable staff/patient ratios, or high staff morale). We have described elsewhere the methodological importance of this distinction.[5]

Intervention description is famously conflated with intervention uptake/ fidelity/ adaptation. The intervention description should be the implementation as described (like the recipe), while the way the interventions is assimilated in the organisation is a finding (like the process the chef actually follows).[6]

These are just a few examples of words with multiple meanings that cause health service researchers to fall over their feet. Some have tried to forge an agreement over these various terms, but widespread agreement is yet to be achieved. In the meantime, it is important to explain precisely what is meant when we talk about implementation, processes, complexity, and so on.

— Richard Lilford, CLAHRC WM Director


  1. McKibbon KA, Lokker C, Wilczynski NL, et al. A cross-sectional study of the number and frequency of terms used to refer to knowledge translation in a body of health literature in 2006: a Tower of Babel? Implementation Science. 2010; 5: 16.
  2. Lilford RJ. Financial Incentives for Providers of Health Care: The Baggage Handler and the Intensive Care Physician. NIHR CLAHRC West Midlands News Blog. 2014 July 25.
  3. Lilford RJ. Two Things to Remember About Human Nature When Designing Incentives. NIHR CLAHRC West Midlands News Blog. 2017 January 27.
  4. Donabedian A. Explorations in quality assessment and monitoring. Health Administration Press, 1980.
  5. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.
  6. Brown C, Hofer T, Johal A, Thomson R, Nicholl J, Franklin BD, Lilford RJ. An epistemology of patient safety research: a framework for study design and interpretation. Part 3. End points and measurement. Qual Saf Health Care. 2008. 17;170-7.

The Same Data Set Analysed in Different Ways Yields Materially Different Parameter Estimates: The Most Important Paper I Have Read This Year

News blog readers know that I have a healthy scepticism about the validity of econometric/regression models. In particular, the importance of being aware of the distinction between confounding and mediating variables, the latter being variables that lie on the causal chain between explanatory and outcome variables. I therefore thank Dr Yen-Fu Chen for drawing my attention to an article by Silberzahn and colleagues.[1] They conducted a most elegant study in which 26 statistical teams analysed the same data set.

The data set concerns the game of soccer and the hypothesis that a player’s skin tone will influence propensity for a referee to issue a red card, which is some kind of reprimand to the player. The provenance of this hypothesis lies in shed loads of studies on preference for lighter skin colour across the globe and subconscious bias towards people of lighter skin colour. Based on access to various data sets that included colour photographs of players, each player’s skin colour was graded into four zones of darkness by independent observers with, as it turned out, high reliability (agreement between observers over and above that expected by chance).

The effect of skin colour tone and player censure by means of the red card was estimated by regression methods. The team was free to select its preferred method. The team could also select which of 16 available variables to include in the model.

The results across the 26 teams varied widely but were positive (in the hypothesised direction) in all but one case. The ORs varied from 0.89 to 2.93 with a median estimate of 1.31. Overall, twenty teams found a significant (in each case positive) relationship. This wide variability in effect estimates was all the more remarkable given that the teams peer-reviewed  each other’s methods prior to analysis of the results.

All but one team took account of the clustering of players in referees and the outlier was also the single team not to have a point estimate in the positive (hypothesised) direction. I guess this could be called a flaw in the methodology, but the remaining methodological differences between teams could not easily be classified as errors that would earn a low score in a statistics examination. Analytic techniques varied very widely, covering linear regression, logistic regression, Poisson regression, Bayesian methods, and so on, with some teams using more than one method. Regarding covariates, all teams included number of games played under a given referee and 69% included player’s position on the field. More than half of the teams used a unique combination of variables. Use of interaction terms does not seem to have been studied.

There was little systematic difference across teams by the academic rank of the teams. There was no effect of prior beliefs about what the study would show and the magnitude of effect estimated by the teams. This may make the results all the more remarkable, since there would have been no apparent incentive to exploit options in the analysis to produce a positive result.

What do I make of all this? First, it would seem to be good practice to use different methods to analyse a given data set, as CLAHRC West Midlands has done in recent studies,[2] [3] though this opens opportunities to selectively report methods that produce results convivial to the analyst. Second, statistical confidence limits in observational studies are far too narrow and this should be taken into account in the presentation and use of results. Third, data should be made publically available so that other teams can reanalyse them whenever possible. Fourth, and a point surprisingly not discussed by the authors, the analysis should be tailored to a specific scientific causal model ex antenot ex post. That is to say, there should be a scientific rationale for choice of potential confounders and explication of variables to be explored as potential mediating variables (i.e. variables that might be on the causal pathway).

— Richard Lilford, CLAHRC WM Director


  1. Silberzahn R, Uthman EL, Martin DP, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Methods Pract Psychol Sci. 2018; 1(3): 337-56.
  2. Manaseki-Holland S, Lilford RJ, Bishop JR, Girling AJ, Chen Y-F, Chilton PJ, Hofer TP; the UK Case Note Review Group. Reviewing deaths in British and US hospitals: a study of two scales for assessing preventability. BMJ Qual Saf. 2017; 26: 408-16.
  3. Mytton J, Evison F, Chilton PJ, Lilford RJ. Removal of all ovarian tissue versus conserving ovarian tissue at time of hysterectomy in premenopausal patients with benign disease: study using routine data and data linkage. BMJ. 2017; 356: j372.

Trials are Not Always Needed for Evaluation of Surgical Interventions: Does This House Agree?

I supported the above motion at a recent surgical trails meeting in Bristol. What where are my arguments?

I argued that there were four broad categories of intervention where trials were not needed:

  1. Where causality is not in dispute

This scenario arises where, but for the intervention, a bad outcome was all but inevitable. Showing that such an outcome can be prevented in only a few cases is sufficient to put the substantive question to bed. Such an intervention is sometimes referred to as a ‘penicillin-type’ of intervention. Surgical examples include heart transplantation and in vitro fertilisation (for people both of whose Fallopian tubes have been removed). From a philosophy of science perspective, causal thinking requires a counterfactual: what would have happened absent the intervention? In most instances a randomised trial provides the best approximation to that counterfactual. However, when the counterfactual is near inevitable death, then a few cases will be sufficient to prove the principle. Of course, this is not the end of the story. Trials of different methods within a generic class will always be needed, along with trials of cases where the indication is less clear cut, and hence where the counterfactual cannot be predicted with a high level of certainty. Nevertheless, the initial introduction of heart transplantation and in vitro fertilisation took place without any randomised trial. Nor was such a trial necessary.

  1. Speculative procedures where there is an asymmetry of outcome

This is similar to the above category, but the justification is ethical rather than scientific. I described a 15 year old girl who was born with no vagina but a functioning uterus. She was referred to me with a pyometra, having had an unsuccessful attempt to create a channel where the vagina should have been. The standard treatment in such a dire situation would have been hysterectomy. However, I offered to improvise and try an experimental procedure using tissue expansion methods to stretch the skin at the vaginal opening and then to use this skin to create a functioning channel linking the uterus to the exterior. The patient and her guardian accepted this procedure, in the full knowledge that it was entirely experimental. In the event, I am glad to report that the operation was successful, producing a functional vagina and allowing regular menstruation.[1] The formal theory behind innovative practice in such dire situations comes from expected utility theory.[2] An example is explicated in the figure.

113 DCB - Trials Eval Sur Interv Figure

This example relates to a person with very low life expectancy and a high-risk procedure that may either prove fatal or extend their life for a considerable time. In such a situation, the expected value of the risky procedure considerably exceeds doing nothing and is preferable, from the point of view of the patient, to entry in an RCT. In fact, the expected value of the RCT (with a 1:1 randomisation ratio) is (0.5 x 0.25) + (0.5 x 1.0) = 0.625. While favourable in comparison to ‘no intervention’, it is inferior in comparison with the ‘risky intervention’.

  1. When the intervention has not been well thought through

Here my example was full frontal lobotomy. Trials and other epidemiological methods can only work out how to reach an objective, not which objective to reach or prioritise. Taking away someone’s personality is nota fair price to pay for mental calmness.

  1. When the trial is poor value for money

Trials are often expensive and we have made them more so with extensive procedural rules. Collection of end-points by routine systems is only part of the answer to this question. Hence trials can be a poor use of research resources. Modelling shows that the value of the information trials provide is sometimes exceeded by the opportunity cost.[3-5]

Of course, I am an ardent trialist. But informed consent must be fully informed so that the preferences of the patient can come into play. I conducted an RCT of two methods of entering a patient into an RCT and showed that more and better information reduced willingness to be randomised.[6] Trial entry is justified when equipoise applies, and the ‘expected value’ of the alternative treatment is about the same.[7] The exception is when the new treatment is unlicensed. Then equipoise plus should apply – the expected value of trial entry should exceed or equal that of standard treatment.[8]

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ, Sharpe DT, Thomas DFM. Use of tissue expansion techniques to create skin fplas for vaginoplasty. Case report. Br J Obstet Gynacol. 1988;95: 402-7.
  2. Lilford RJ. Trade-off between gestational age and miscarriage risk of prenatal testing: does it vary according to genetic risk? Lancet. 1990; 336: 1303-5.
  3. De Bono M, Fawdry RDS, Lilford RJ. Size of trials for evaluation of antenatal tests of fetal wellbeing in high risk pregnancy. J Perinat Med. 1990; 18(2): 77-87.
  4. Lilford R, Girling A, Braunholtz D. Cost-Utility Analysis When Not Everyone Wants the Treatment: Modeling Split-Choice Bias.Med Decis Making. 2007; 27(1): 21-6.
  5. Girling AJ, Freeman G, Gordon JP, Poole-Wilson P, Scott DA, Lilford RJ. Modeling payback from research into the efficacy of left-ventricular assist devices as destination therapy. Int J Technol Assess Health Care. 2007; 23(2): 269-77.
  6. Wragg JA, Robison EJ, Lilford RJ. Information presentation and decisions to enter clinical trials: a hypothetical trial of hormone replacement therapy. Soc Sci Med. 2000; 51(3): 453-62.
  7. Lilford J. Ethics of clinical trials from a Bayesian and decision analytic perspective: whose equipoise is it anyway?BMJ. 2003; 326: 980.
  8. Robinson EJ, Kerr CE, Stevens AJ, Lilford RJ, Braunholtz DA, Edwards SJ, Beck SR, Roelwy MG. Lay public’s understanding of equipoise and randomisation in randomised controlled trials. Health Technol Assess. 2005; 9(8): 1-192.

Estimating Mortality Due to Low-Quality Care

A recent paper by Kruk and colleagues attempts to estimate the number of deaths caused by sub-optimal care in low- and middle-income countries (LMICs).[1] They do so by selecting 61 conditions that are highly amenable to healthcare. They estimate deaths from these conditions from the global burden of disease studies. The proportion of deaths attributed to differences in health systems is estimated from the difference in deaths between LMICs and high-income countries (HICs). So if the death rate from stroke in people aged 70 to 75 is ten per thousand in HICs and 20 per thousand in LMICs, then ten deaths per 1000 are preventable. This ‘subtractive method’ to estimate deaths that could be prevented by improved health services simply answers the otiose question: “what would happen if low-income countries and their populations could be converted, by the wave of a wand, into high-income countries complete with populations enjoying high income from conception?” Such a reductionist approach simply replicates the well-known association between per capita GDP and life expectancy.[2]

The authors of the above paper do try to isolate the effect of institutional care from access to facilities. To make their distinction they need to estimate utilisation of services. This they do from various household surveys, conducted at selected sites around the world. These surveys contain questions about service use. So a further subtraction is performed; if half of all people deemed to be having a stroke utilise care, then half of the difference in stroke mortality can be attributed to quality of care.

Based on this methodology the authors find that the lion’s share of deaths are caused by poor quality care not failure to get care. This conclusion is flawed because:

  1. The link between the databases is at a very coarse level – there is no individual linkage.
  2. As a result risk-adjustment is not possible.
  3. Further to the above, the method is crucially unable to account for delays in presentation and access to care preceding presentation that will inevitably result in large differences in prognosis at presentation.
  4. Socio-economic status and deprivation over a lifetime is associated with recovery from a condition, so differences in outcome are not due only to differences in care quality.[3]
  5. There are measurement problems at every turn. For example, Global Burden of Disease is measured in very different ways across HICs and LMICs – the latter rely heavily on verbal autopsy.
  6. Quality, as measured by crude subtractive methodologies, includes survival achieved by means of expensive high technology care. However, because of opportunity costs, introduction of effective but expensive treatments will do more harm than good in LMICs (until they are no longer LMICs).

The issue of delay in presentation is crucial. Take, for example, cancer of the cervix. In HICs the great majority of cases are diagnosed at an early, if not at a pre-invasive, stage. However, in low-income countries almost all cases were already far advanced when they present. To attribute the death rate difference to the quality of care is inappropriate. Deep in the discussion the authors state ‘comorbidity and disease history could be different between low and high income countries which can result in some bias.’ This is an understatement and the problem cannot be addressed by a passing mention of it. Later they also assert that all sensitivity analyses support the conclusion that poor healthcare is a larger driver of amenable mortality than utilisation of services. But it is really difficult to believe such a sensitivity analyses when this bias is treated so lightly.

Let us be clear, there is tons of evidence that care is, in many respects, very sub-optimal in LMICs. We care about trying to improve it. But we think such dramatic results based on excessively reductionist analyses are simply not justifiable and in seeking attention in this way risk undermining broader support for the important goal of improving care in LMICs. In areas from global warming to mortality during the Iraq war we have seen the harm that marketing with unreliable methods and generalizing beyond the evidence can do to a good cause by giving fodder to those who don’t want to believe that there is a problem. What is needed are careful observations and direct measurements of care quality itself, along with evaluations of the cost-effectiveness of methods to improve care. Mortality is a crude measure of care quality.[4][5] Moreover, the extent to which healthcare reduces mortality is quite modest among older adults. The type of paper reported here topples over into marketing – it is as unsatisfying as a scientific endeavour as it is sensational.

— Richard Lilford, CLAHRC WM Director

— Timothy Hofer, Professor in Division of General Medicine, University of Michigan


  1. Kruk ME, Gage AD, Joseph NT, Danaei G, García-Saisó S, Salomon JA. Mortality due to low-quality health systems in the universal health coverage era: a systematic analysis of amenable deaths in 137 countries. Lancet. 2018.
  2. Rosling H. How Does Income Relate to Life Expectancy. Gap Minder. 2015.
  3. Pagano D, Freemantle N, Bridgewater B, et al. Social deprivation and prognostic benefits of cardiac surgery: observational study of 44,902 patients from five hospitals over 10 years. BMJ. 2009; 338: b902.
  4. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet. 2004; 363: 1147-54.
  5. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling studyBMJ Qual Saf. 2012; 21(12): 1052-6.

A framework for implementation science: organisational and psychological approaches

Damschroder and colleagues present a meta-analytic approach to development of a framework to guide implementation of service interventions.[1] They call their framework a “consolidated framework for implementation research”. Their approach is based on a review of published theories concerning implementation of service interventions. Since two-thirds of interventions to improve care fail, this is an important activity. They offer on over-arching typology of constructs that deal with barriers to effective implementation, and build on Greenhalgh’s monumental study [2] of factors determining the diffusion, dissemination and implementation of innovations in health service delivery. These frameworks are useful because they take an organisation-wide perspective and so psychological frameworks of individual behaviour change, such as the trans-theoretical [3] or COM-B [4] frameworks are subsumed within these frameworks. I proposed something similar with my “framework of frameworks”.[5]

In any event, the framework produced seems sensible enough. In effect it is an elaboration of the essential interactive dimensions of intervention, context and the process of implementation. Context can be divided into the external setting and the internal setting. This particular study goes further and ends up with five major domains, each broken up into a number of constructs – eight relating to the intervention itself.

This paper is carefully written and well researched, and is an excellent source of references to some of icons of the organisational research literature. But is it useful? And will it be the last such framework? I rather think the answer to these two questions is no. I once had a boss who said the important thing about science was ‘knowing what to leave out’! I think a much simpler framework would have sufficed in this case. Maybe I should have a go at producing one!

— Richard Lilford, CLAHRC WM Director


  1. Damschroder LJ, Aron DC, Keith RE, Kirsch SR, Alexander JA, Lowrey JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implement Sci. 2009; 4: 50.
  2. Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: systematic review and recommendations. Milbank Q. 2004; 82: 581-629.
  3. Prochaska JO, Velicer WF. The transtheoretical model of health behaviour change. Am J Health Promot. 1997; 12(1): 38-48.
  4. Michie S, van Stralen M, West R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implement Sci. 2011; 6: 42.
  5. Lilford RJ. A Theory of Everything! Towards a Unifying Framework for Psychological and Organisational Change Models. NIHR CLAHRC West Midlands News Blog. 28 August 2015.

How Theories Inform our Work in Service Delivery Practice and Research

We have often written about theory in this News Blog.[1] [2] For instance, the ‘iron law’ of incentives – never use reward or sanction unless the manager concerned believes that they can influence the odds of success in reaching the required target.[3] [4] This law, sometimes called ‘expectancy theory’ was articulated by Victor Vroom back in 1964.[5] Here we review some of the theories that we have described, refined or enlarged over the last CLAHRC, and which we shall include among those we will pursue if we are successful in our Applied Research Collaborations (ARC) application. In each case we begin with the theory, then say how we have explicated it, and then describe how we plan to further develop theory through ongoing empirical work. Needless to say, our summaries are an impoverished simulacrum of the full articles:

  1. The theory of ‘hybrid managers’. It is well known that many professionals develop hybrid roles so that they toggle between their professional and managerial duties, and it is also known that tension can arise when the roles conflict. In our work we found that organisational factors can determine the extent to which nurses retain strong professional ethos when fulfilling managerial roles.[6] Simply put, the data seem to show that nurses working in more successful healthcare institutions tend to hew closer to their professional ethos than nurses in less successful units. It is reasonable to infer that an environment that can accommodate a strong professional orientation among hybrid managers is more likely to encompass the checks and balances conducive of safe care, than one that does not accommodate such a range of perspectives; most of us would choose to be treated in the environment where professional ethos is allowed a fair degree of expression. However, whether such a climate reflects better managers or a more difficult external environment is harder to discern. We now plan to examine this issue across many environments – for example, midwife hybrid managers balancing the need to expand choices of place of delivery with logistical limitations on doing so. Similarly, improving care for people with learning difficulties will require clinical managers to have freedom to innovate in order to improve services. Note that working with Warwick Business School enables us to locate our enquiries and theory development in the context of management in general, rather than just the management of health services. For example, the above study of nurse managers encompasses tax inspectors who now have to balance their traditional role in enforcing the tax code with one of helping the likes of us to make accurate declarations.
  2. Hybrid managers as knowledge brokers. Hybrid managers, it is known, act as a conduit between senior managers and frontline professionals, in mediating adoption of effective practice – i.e. knowledge brokering. It is also known that effecting change means overcoming structural, social and motivational barriers. The task of implementing state-of-the-art care practices is a delicate one and, prior to our research, the social dynamic of effecting change was poorly understood. In particular, the CLAHRC WM team wanted to study the role of status and perceived legitimacy in facilitating or inhibiting the knowledge brokers task. We found that hierarchies are critically important – safe care is more than following rules, but requires a degree of initiative (sometimes called discretional energy) by multiple actors across the hierarchy.[7] Nurses were often severely inhibited in using such personal initiative. The attitude of more senior staff is thus crucial in permitting, indeed encouraging, the use of initiative within a broader system of checks and balances. If the hierarchy within nursing is a barrier to progress, then that between doctors and nurses is a much bigger obstacle to uptake of knowledge. Moreover, there was also evidence of a difference barrier across different medical specialities with clinicians at the most action-oriented end of the spectrum (such as surgeons) showing lower levels of team-working than those with more reflective daily tasks (such as geriatricians). The work pointed towards the effectiveness of creating opportunities for purposeful interaction across these various types of hierarchical barriers – what the researchers called “dyadic relationships between hybrid middle managers with clinical governance responsibility and doctors through engagement and participation in medical-oriented meetings”; Elinor Ostrom would call this opportunities for ‘cheap talk’.[8] This work is crucial for laying the foundation for our work on the integration of care covering management of patients at the acute hospital / ambulance / community interface; care of patients with multiple diseases; care of the elderly; and the care of people with rare diseases, to mention but a few. Clearly, such opportunities for structured interaction are only parts of the story, and other factors that have been shown to be important (e.g. job design, performance management, education, patient empowerment, and data sharing) must be included in service improvement initiatives.
  3. Logics. Our third example concerns the unwritten assumptions that underpin what a person should do in their domain of work, and why they do it – so called ‘logics’. In a place like a hospital or university, many professions must co-exist, yet each will have a different ‘logic’. This idea applies across society, but CLAHRC WM investigator Graeme Currie wanted to examine how the professional logic and policy logic interact in a hospital setting.[9] The background to this study is the finding that policy logic has constrained and limited professional logic over the last few decades – doctors are no longer in charge of performance improvement, the management of waiting lists, etc. The researchers used the introduction of a new evidence-based, multi-component guideline as a lens through which to explore the interactions of different ‘logics’ in hospital practice. The implementation of a multi-component guideline is not a simple thing, and some intuitive cost-benefit calculations could justify, at least intellectually, massaging some aspects of the guideline to fit management practices rather than the reverse. However, the way this played out was not the same across contexts. As before, doctors were generally (but not invariably) less amenable to change than nurse practitioners with managerial responsibility. This study, published in a premier management journal,[9] identifies contingencies that will provide depth to our evaluations of different ways to reshape services. We will build on these insights when we examine a proposed service to use Patient Reported Outcome Measures, rather than simply elapsed time, to determine when patients should be seen in the outpatient department. An understanding of ‘logics’ is likely to come into play when we empower community and ambulance staff to elicit patient preferences and respect them even when to do so flies in the face of guidelines. At the level of the system, change is best viewed as an institutional problem of professional power and policy, around which change needs to orientate. It is not that systems and organisations can’t be changed, but subtle tactics and work may be required.[10] [11]
  4. Health care organisations viewed as political grouping and the need to do ‘political work’ when implementing interventions. Trish Greenhalgh has recently provided an evidence-based framework which unpicks reasons why IT implementations so often disappoint.[12] She points out that managers consistently underestimate the size of the task and the sheer difficulty of implementing IT systems so that they reach even some of their potential. Likewise, work conducted under an NIHR Programme grant that developed out of CLAHRC WM showed how new IT systems could introduce serious new hazards.[13] One of the methods to avoid failure in any large initiative, such as a large IT system, comes from a study of Italian hospitals conducted by the CLAHRC WM team,[14] advocating an iterative process, time and careful preparation of the ground by doing ‘political work’ to win hearts and minds and adapt interventions to context.[15] This type of approach will be critical to the development of complex interventions, such as those widening access to homebirth, and integrating patient feedback (including Patient Reported Outcome Measures) into patient care pathways.
  5. Absorptive capacity. Many CLAHRCs have relied on a knowledge brokering model to underpin translation of research, through which key individuals ensure knowledge gets to the right people at the right time to benefit patient care.[16] However, such an approach may have a limited effect and we need to consider how organisations and systems can be developed to ensure the efforts of knowledge brokers are leveraged and evidence informs patient care more widely. This is a matter of developing organisation and system ‘absorptive capacity’. Many of the implementation studies under our current CLAHRC have sought to develop co-ordination capability of organisations and systems to translate evidence into practice. For example, public and patient involvement, GP involvement, better business intelligence processes and structures is highlighted as ensuring clinical commissioning groups make evidence-informed decisions.[17] We have taken our work further to develop a ‘tool’ to assess the Absorptive Capacity of organisations.[18]

In this short review we have described how theoretical work, based on the development and evaluation of service interventions, can help understand the reasons why an intervention may succeed or fail, and how this may vary from place to place. Increasingly we are applying Elinor Ostrom’s work on collaboration between managers when the incentives are not aligned to the problems of integrated care in the NHS.[19] Our work represents successful collaboration between management and medical schools and, indeed, a difference in ‘logics’ between these organisations. This collaboration has taken time to mature, as have those between the services and academia more broadly. The essential point is that consideration of wider organisational and systems context will prove crucial to our efforts to continue broadening, accelerating and deepening translation of evidence into practice in our proposed ARC.

— Richard Lilford, CLAHRC WM Director

— Graeme Currie, Professor of Public Management, CLAHRC WM Deputy Director


  1. Lilford RJ. A Theory of Everything! Towards a Unifying Framework for Psychological and Organisational Change Models. NIHR CLAHRC West Midlands News Blog. 28 August 2015.
  2. Lilford RJ. Demystifying Theory. NIHR CLAHRC West Midlands News Blog. 10 April 2015.
  3. Lilford RJ. Financial Incentives for Providers of Health Care: The Baggage Handler and the Intensive Care Physician. NIHR CLAHRC West Midlands News Blog. 25 July 2015.
  4. Lilford RJ. Two Things to Remember About Human Nature When Designing Incentives. NIHR CLAHRC West Midlands News Blog. 27 January 2017.
  5. Vroom VH. Work and motivation. Oxford, England: Wiley. 1964.
  6. Croft C, Currie G, Lockett A. The impact of emotionally important social identities on the construction of managerial leader identity: A challenge for nurses in the English NHS. Organ Stud. 2015; 36(1): 113-31.
  7. Currie G, Burgess N, Hayton JC. HR Practices and Knowledge Brokering by Hybrid Middle Managers in Hospital Settings: The Influence of Professional Hierarchy. Hum Res Manage. 2015; 54(5): 793-812.
  8. Lilford RJ. Polycentric Organisations. NIHR CLAHRC West Midlands News Blog. 25 July 2014.
  9. Currie G & Spyridonidis D. Interpretation of Multiple Institutional Logics on the Ground: Actors’ Position, their Agency and Situational Constraints in Professionalized Contexts. Organ Stud. 2016; 37(1): 77-97.
  10. Currie G, Lockett A, Finn R, Martin G, Waring J. Institutional work to maintain professional power: Recreating the model of medical professionalism. Organ Stud. 2012; 33(7): 937-62.
  11. Lockett A, Currie G, Waring J, Finn R, Martin G. The influence of social position on sensemaking about organizational change. Acad Manage J. 2014; 57(4): 1102-29.
  12. Lilford RJ. New Framework to Guide the Evaluation of Technology-Supported Services. NIHR CLAHRC West Midlands News Blog. 12 January 2018.
  13. Cresswell KM, Mozaffar H, Lee L, Williams R, Sheikh A. W. Workarounds to hospital electronic prescribing systems: a qualitative study in English hospitals. BMJ Qual Saf. 2017; 26: 542-51.
  14. Radaelli G, Currie G, Frattini F, Lettieri E. The Role of Managers in Enacting Two-Step Institutional Work for Radical Innovation in Professional Organizations. J Prod Innov Manag, 2017; 34(4): 450-70.
  15. Lilford RJ. Implementation Science at the Crossroads. BMJ Qual Saf. 2017; 27: 331-2.
  16. Rowley E, Morriss R, Currie G, Schneider J. Research into practice: Collaboration for Leadership in Applied Health Research and Care (CLAHRC) for Nottinghamshire, Derbyshire and Lincolnshire (NDL). Implement Sci. 2012; 7:
  17. Croft C & Currie G. ‘Enhancing absorptive capacity of healthcare organizations: The case of commissioning service interventions to avoid undesirable older people’s admissions to hospitals’. In: Swan J, Nicolini D, et al., Knowledge Mobilization in Healthcare. Oxford: Oxford University Press; 2016.
  18. Currie G, Croft C, Chen Y, Kiefer T, Staniszewska S, Lilford RJ. The capacity of health service commissioners to use evidence: a case study. Health Serv Del Res. 2018; 6(12).
  19. Lilford RJ. Evaluating Interventions to Improve the Integration of Care (Among Multiple Providers and Across Multiple Sites). NIHR CLAHRC West Midlands News Blog. 10 February 2017.