Tag Archives: Bias

Mandatory Publication and Reporting of Research Findings

Publication bias refers to a phenomenon by which research findings that are statistically significant or perceived to be interesting/desirable are more likely to be published, and vice versa.[1] The bias is a major threat to scientific integrity and can have major implications for patient welfare and resource allocation. Progress has been made over the years in raising awareness and minimising the occurrence of such bias in clinical research: pre-registration of trials has been made compulsory by editors of leading medical journals [2] and subsequently regulatory agencies. Evidence of a positive impact on the registration and reporting of findings from trials used to support drug licensing has started to emerge.[3,4] So can this issue be consigned to history now? Unfortunately the clear answer is no.

A recent systematic review showed that, despite a gradual improvement in the past two decades, the mean proportion of pre-registration among randomised controlled trials (RCTs) included in previous meta-epidemiological studies of trial registration only increased from 25% to 52% between 2005 to 2015.[5] A group of researchers led by Dr Ben Goldacre created the EU Trials Tracker (https://eu.trialstracker.net/), which utilises automation to facilitate the identification of trials that are due to report their findings but have not done so within the European Union Clinical Trials Register.[6] Their estimates show a similar picture that half of the trials that were completed have not reported their results. The findings of the Trial Tracker are presented in a league table that allows people to see which sponsors have the highest rate of unreported trials. You might suspect that pharmaceutical companies are likely to be the top offenders given the high profile cases of supressing drug trial data in the past. In fact the opposite is now true – major pharmaceutical companies are among the best compliers of trial reporting, whereas some of the universities and hospitals seem to have achieved fairly low reporting rates. While there may be practical issues and legitimate reasons behind the absence/delay in the report of findings for some of the studies, the bottom line is that making research findings available is a moral duty for researchers irrespective of funding sources, and with improved trial registration and enhanced power of data science, leaving research findings to perish and be forgotten in a file drawer/folder is neither an acceptable nor a feasible option.

With slow but steady progress in tackling publication bias in clinical research, you might wonder about health services research that is close to heart for our CLAHRC. Literature on publication bias in this field is scant, but we have been funded by the NIHR HS & DR Programme to explore the issue in the past two years and some interesting findings are emerging. Interested readers can access further details, including conference posters reporting our early findings, on our project website (warwick.ac.uk/publicationbias). We will share further results with News Blog readers in the near future, and in due course, publish them all!

— Yen-Fu Chen, Associate Professor

References:

  1. Song F, Parekh S, Hooper L, et al. Dissemination and publication of research findings: an updated review of related biases. Health Technol Assess. 2010;14(8):1-193.
  2. Laine C, De Angelis C, Delamothe T, et al. Clinical trial registration: looking back and moving aheadAnn Intern Med. 2007;147(4):275-7.
  3. Zou CX, Becker JE, Phillips AT, et al. Registration, results reporting, and publication bias of clinical trials supporting FDA approval of neuropsychiatric drugs before and after FDAAA: a retrospective cohort study. Trials. 2018;19(1):581.
  4. Phillips AT, Desai NR, Krumholz HM, Zou CX, Miller JE, Ross JS. Association of the FDA Amendment Act with trial registration, publication, and outcome reporting. Trials. 2017;18(1):333.
  5. Trinquart L, Dunn AG, Bourgeois FT. Registration of published randomized trials: a systematic review and meta-analysis. BMC Medicine. 2018;16(1):173.
  6. Goldacre B, DeVito NJ, Heneghan C, et al. Compliance with requirement to report results on the EU Clinical Trials Register: cohort study and web resource. BMJ. 2018;362:k3218.

Preference Trials: an Old Subject Revisited

The CLAHRC WM Director tries to keep up to date with his literature summaries. However, from time to time, he dips into past literature. Recently he had a reason to re-read a paper on preference trials from David Torgerson and Bonnie Sibbald.[1] They point out the shortcomings of the comprehensive cohort design, whereby randomised people are followed-up along with those who manifest a preference such that they decline randomisation. However, the non-randomised cohorts so generated are subject to selection bias. To get around this bias a trial is proposed, in which all patients are randomised, but where the preference is recorded prior to randomisation. Measuring patient preferences within a fully randomised design conserves all the advantage of a randomised study with the further benefit of allowing for the interaction between outcome and preference to be measured. But, of course, there is an ethical issue here, in that people who are not equipoised are randomised. And why would you be randomised if not in equipoise? One reason is if the treatment, if not apparently successful, can be reversed later. For example, a person might be happy to be randomised to a trial of a medicine to reduce the frequency of migraine. The person might be happy to sacrifice a small quantum of expected utility to contribute to knowledge, secure in the belief that they can reverse the decision later. But to elicit such altruism in the face of a life or death treatment comparison, radiotherapy vs. surgery for prostate cancer for example, is to privilege knowledge over individual welfare in a non-trivial way. So here is ‘Lilford’s rule’ – do not offer patients ‘preference trials’ when the outcome is severe and irrevocable. In such circumstances it is fine to offer randomisation, but those who have a preference – either way – should not be subtly coerced to accept randomisation.[2] Further, the patient should be fully informed because patients are more likely to accept randomisation when information is withheld.[3]

— Richard Lilford, CLAHRC WM Director

References:

  1. Torgerson D, Sibbald B. Understanding controlled trials: What is a patient preference trial? BMJ. 1998; 316: 360.
  2. Lilford RJ. Ethics of Clinical Trials from a Bayesian and Decision Analytic Perspective: Whose Equipoise Is It Anyway? BMJ. 2003; 326: 980.
  3. Wragg JA, Robinson EJ, Lilford RJ. Information presentation and decisions to enter clinical trials: a hypothetical trial of hormone replacement therapy. Soc Sci Med. 2000; 51(3): 453-62.

When Randomisation is not Enough: Masking for Efficacy Trials of Skin Disease, Ulcers and Wound Infections

In a previous News Blog [1] we discussed endpoint measurement for trials of wound infection, where the observers were not ‘blinded’ (not masked to the intervention group). Such an approach is simply not adequate, even if the observers use ‘strict criteria’.[1] This is because of subjectivity in the interpretation of the criteria and, more especially, because of reactivity. Reactivity means that observers are influenced, albeit sub-consciously, by knowledge of the group to which patients have been assigned (treatment or not). Such reactivity is an important source of bias in science.[2]

We are proposing a trial of a promising treatment for recurrent leprosy ulcers that we would like to carry out in the Leprosy Mission Hospital in Kathmandu, Nepal. We plan to conduct an efficacy trial of a regenerative medicine (RM) technique where a paste is made from the buffy coat layer of the patient’s own blood. This is applied to the ulcer surface at the time of dressing change. The only difference in treatment will be whether or not the RM technique is applied when the regular change of wet dressing is scheduled. We will measure, amongst other things, the rate of healing on the ulcers and time to complete healing and discharge from hospital.

Patients will be randomised so as to avoid selection bias and, as the primary endpoints in this efficacy trial are measured during the hospital sojourn (and patients seldom discharge themselves), we are mainly concerned with outcome bias as far as endpoints regarding ulcer size are concerned.

One obvious way to get around the problem of reactivity is to use a well described method in which truly masked observers, typically based off-site, measure ulcer size using photographs. Measurements are based on a sterile metal ruler positioned at the level of the ulcer to standardise the measurement irrespective of the distance of the camera. The measurement can be done manually or automated by computer (or both). But is that enough? It has been argued that bias can still arise, not at the stage where photographs are analysed, but rather at the earlier stage of photograph acquisition. This argument holds that, again perhaps sub-consciously, those responsible for taking the photograph can affect its appearance. The question of blinding / masking of medical images is a long-standing topic of debate.

The ‘gold standard’ method is to have an independent observer arrive on the scene at the appropriate time to make the observations (and take any photographs). Such a method would be expensive (and logistically challenging over long distances). So, an alternative would be to deploy such an observer for a random sub-set of cases. This method may work but it has certain disadvantages. First, it would be tricky to choreograph as it would disrupt the work flow in settings such as that described above. Second, to act as a method of audit, it would need to be used alongside the existing method (making the method still more ‘unwieldy’). Third, the method of preparing the wound would still lie in the hands of the clinical team, and arguably still be subject to some sort of subconscious ‘manipulation’ (unless the observer also provided the clinical care). Fourth, given that agreement would not be exact between observers, a threshold would have to be agreed regarding the magnitude of difference between the standard method and the monitoring method that would be regarded as problematic. Fifth, it would not be clear how to proceed if such a threshold was crossed. While none of these problems are necessarily insurmountable, they are sufficiently problematic to invite consideration of further methods. What might augment or replace standard third party analysis of photographic material?

Here we draw our inspiration from a trial of surgical technique in the field of ophthalmology/orbital surgery.[3] In this trial, surgical operations were video-taped in both the intervention and control groups. With permission of patients, we are considering such an approach in our proposed trial. The vast majority of ulcers are on the lower extremities, so patients’ faces would not appear in the videos. The videos could be arranged so that staff were not individually identifiable, though they could be redacted if and where necessary. We would like to try to develop a method whereby the photographs were directed in real time by remote video link, but pending the establishment of such a link, we propose that each procedure (dressing change) is video-taped, adhering to certain guidelines (for example, shot in high-definition, moving the camera to give a full view of the limb from all sides, adequate lighting, a measurement instrument is included in the shot, etc.). We propose that measurements are made both in the usual way (from mobile phone photographs), and from ‘stills’ obtained from the video-tapes. Each could be scored by two independent, off-site observers. Furthermore the videos could be used as a method of ‘ethnographic’ analysis of the process to surface any material differences between patients in each trial arm in lighting, preparation of ulcer sites, time spent on various stages of the procedure and photograph acquisition, and so on.

Would this solve the problem? After all, local clinicians would still prepare the ulcer site for re-bandaging and, insofar as they may be able to subconsciously manipulate the situation, this risk has not been vitiated. However, we hypothesise that the video will work a little like a black box on an aeroplane; it cannot stop things happening, but it provides a powerful method to unravel what did happen. The problem we believe we face is not deliberate maleficence, but subtle bias at the most. We think that by using the photographic approach, in accordance with guidelines for such an approach,[4] we already mitigate the risk of outcome measurement bias. We think that by introducing a further level of scrutiny, we reduce the risk of bias still further. Can the particular risk we describe here be reduced to zero? We think not. Replication remains an important safeguard to the scientific endeavour. We now turn our attention to this further safeguard.

Leprosy ulcers are far from the only type of ulcer to which the regenerative medicine solution proposed here is relevant. Diabetic ulcers, in particular, are similar to leprosy ulcers in that loss of neural sensation plays a large part in both. We have argued elsewhere that much can be learned by comparing the results of the same treatment across different disease classes. In due course we hope to collaborate with those who care for other types of skin ulcer so that we can compare and contrast and also to advance methodologically. Together we will seek the optimal method to limit expense and disruption of workflow while minimising outcome bias from reactive measurements.

— Richard Lilford, CLAHRC WM Director

References:

  1. Lilford RJ. Before and After Study Shows Large Reductions in Surgical Site Infections Across Four African Countries. NIHR CLAHRC West Midlands News Blog. 10 August 2018.
  2. Kazdin AE. Unobtrusive measures in behavioral assessment. J Appl Behav Anal. 1979; 12: 713–24.
  3. Feldon SEScherer RWHooper FJ, et al. Surgical quality assurance in the Ischemic Optic Neuropathy Decompression Trial (IONDT)Control Clin Trials. 200324294-305.
  4. Bowen AC, Burns K, Tong SY, Andrews RM, Liddle R, O’Meara IM, et al. Standardising and assessing digital images for use in clinical trials: a practical, reproducible method that blinds the assessor to treatment allocation. PLoS One. 2014;9(11):e110395.

The Same Data Set Analysed in Different Ways Yields Materially Different Parameter Estimates: The Most Important Paper I Have Read This Year

News blog readers know that I have a healthy scepticism about the validity of econometric/regression models. In particular, the importance of being aware of the distinction between confounding and mediating variables, the latter being variables that lie on the causal chain between explanatory and outcome variables. I therefore thank Dr Yen-Fu Chen for drawing my attention to an article by Silberzahn and colleagues.[1] They conducted a most elegant study in which 26 statistical teams analysed the same data set.

The data set concerns the game of soccer and the hypothesis that a player’s skin tone will influence propensity for a referee to issue a red card, which is some kind of reprimand to the player. The provenance of this hypothesis lies in shed loads of studies on preference for lighter skin colour across the globe and subconscious bias towards people of lighter skin colour. Based on access to various data sets that included colour photographs of players, each player’s skin colour was graded into four zones of darkness by independent observers with, as it turned out, high reliability (agreement between observers over and above that expected by chance).

The effect of skin colour tone and player censure by means of the red card was estimated by regression methods. The team was free to select its preferred method. The team could also select which of 16 available variables to include in the model.

The results across the 26 teams varied widely but were positive (in the hypothesised direction) in all but one case. The ORs varied from 0.89 to 2.93 with a median estimate of 1.31. Overall, twenty teams found a significant (in each case positive) relationship. This wide variability in effect estimates was all the more remarkable given that the teams peer-reviewed  each other’s methods prior to analysis of the results.

All but one team took account of the clustering of players in referees and the outlier was also the single team not to have a point estimate in the positive (hypothesised) direction. I guess this could be called a flaw in the methodology, but the remaining methodological differences between teams could not easily be classified as errors that would earn a low score in a statistics examination. Analytic techniques varied very widely, covering linear regression, logistic regression, Poisson regression, Bayesian methods, and so on, with some teams using more than one method. Regarding covariates, all teams included number of games played under a given referee and 69% included player’s position on the field. More than half of the teams used a unique combination of variables. Use of interaction terms does not seem to have been studied.

There was little systematic difference across teams by the academic rank of the teams. There was no effect of prior beliefs about what the study would show and the magnitude of effect estimated by the teams. This may make the results all the more remarkable, since there would have been no apparent incentive to exploit options in the analysis to produce a positive result.

What do I make of all this? First, it would seem to be good practice to use different methods to analyse a given data set, as CLAHRC West Midlands has done in recent studies,[2] [3] though this opens opportunities to selectively report methods that produce results convivial to the analyst. Second, statistical confidence limits in observational studies are far too narrow and this should be taken into account in the presentation and use of results. Third, data should be made publically available so that other teams can reanalyse them whenever possible. Fourth, and a point surprisingly not discussed by the authors, the analysis should be tailored to a specific scientific causal model ex antenot ex post. That is to say, there should be a scientific rationale for choice of potential confounders and explication of variables to be explored as potential mediating variables (i.e. variables that might be on the causal pathway).

— Richard Lilford, CLAHRC WM Director

References:

  1. Silberzahn R, Uthman EL, Martin DP, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Methods Pract Psychol Sci. 2018; 1(3): 337-56.
  2. Manaseki-Holland S, Lilford RJ, Bishop JR, Girling AJ, Chen Y-F, Chilton PJ, Hofer TP; the UK Case Note Review Group. Reviewing deaths in British and US hospitals: a study of two scales for assessing preventability. BMJ Qual Saf. 2017; 26: 408-16.
  3. Mytton J, Evison F, Chilton PJ, Lilford RJ. Removal of all ovarian tissue versus conserving ovarian tissue at time of hysterectomy in premenopausal patients with benign disease: study using routine data and data linkage. BMJ. 2017; 356: j372.

The Sexual Politics of the Operating Theatre

When I lead the Patient Safety Research Portfolio on behalf of the Chief Medical Officer, I commissioned an ethnographic study of the operating theatre environment from Steven Harrison of Manchester.[1] The study chronicled a tale of persistent interruption during surgical operations – telephones rang, messages were sent from the wards, people burst in with the latest cricket score, and so on. Harrison speculated that such a string of interruptions would be inimical to patient safety. He was right; we have cited evidence that frequent interruptions are indeed a threat to patient safety.[2] Repeated distraction intrudes on working memory and thereby predisposes to error – a factor long recognised in aviation (see also the following news blog article).

It turns out that more subtle processes are also in play – gender mix has a large effect on a behaviour, at least according to observations of 400 doctors and nurses during 200 surgical operations.[3] The study was carried out by experts on animal behaviour and the findings showed that the doctors and nurses tended to mimic those of animals in the wild. Think of that next time you have a surgical operation!  Conflict between individuals was twice as likely in teams led by men as in teams led by women. Regardless of who led the team, conflict was much less when the team leader was the opposite gender to the rest of the team.

In previous issues of your news blog, we have cited comparisons between male and female doctors.[4] [5] In all cases the female doctors show higher performance. As pointed out in these blogs, these findings are replicated over many complex tasks in the modern economy. I am led to the conclusion  that the evolutionary characteristics of women are more conducive  for high performance in the modern collaborative economy, then those which males acquired in order to hunt animals and repel enemies. But all is not lost for us men. Awareness of our own foibles is the first step to adaptation and more effective functioning in the modern workplace.

— Richard Lilford, CLAHRC WM Director

References:

  1. Harrison S. Operating theatres – the threats to patient safety. PSRP Briefing Paper. PS008. 2006.
  2. Lilford RJ. Interruptions Lead to Errors. NIHR CLAHRC West Midlands News Blog. 23 March 2018.
  3. Jones LK, Jennings BM, Higgins MK, de Waal FBM. Ethological observations of social behavior in the operating room. Proc Nat Acad Sci. 2018.
  4. Lilford RJ. Are Female Doctors Better Than Male Doctors? NIHR CLAHRC West Midlands News Blog. 13 January 2017.
  5. Lilford RJ. “Why Can’t a Man be More Like a Woman” – Revisited. NIHR CLAHRC West Midlands News Blog. 27 October 2017.

Retrospective Study of the Quality of Care Given to Patients Who Have Died – A Design That Should Be Laid to Rest According to Bach, et al.

Prof Tim Hofer (University of Michigan) drew this important article to my attention.[1] Bach and colleagues consider studies of the quality of care given to people who have recently died. Such studies sound laudable, but they are a poor reflection of the care given to dying patients.  Why is it that these apparently well-meaning studies provide biased results? There are two main problems. First, when patients are still alive it is not known who will die – many who die are not identified as dying, while many identified as dying do not do so. Thus, dead patients are a poor reflection of dying patients; they are a highly skewed group. Second, when the care given to dead people is examined, a time interval prior to death must be specified and the results obtained are highly sensitive to this arbitrary threshold. This study used real, population-based cohorts to show how differences in subject selection and time lead to massive bias. Retrospective case series, based on events that precede eligibility for inclusion in the cohort, simply should not be used to quantify the quality of care.

— Richard Lilford, CLAHRC WM Director

Reference:

  1. Bach PB, Schrag D, Begg CB. Resurrecting Treatment Histories of Dead Patients. JAMA. 2004; 292: 2765-70.

Calling All Service Delivery Researchers

Daw and Hatfield draw attention to an important source of bias in non-experimental matched before and after studies.[1] Matching can introduce a bias under these circumstances. The bias results when regression to the mean is a possibility. Consider an intervention targeted at institutions with a high mortality rate. Matching will introduce bias because the intervention cluster has a low mortality relative to its group and the control cluster has a high mortality relative to its group. If this were not so, then they would not have matched. So a difference between the two groups may emerge over time, and be ascribed to any intervention, even when there was no intervention effect.  Traditionally, we say that as confounder is associated with the intervention (treatment assignment) and the outcome. However, a confounder can be associated with the intervention and the propensity of the outcome to change over time. This applies in before and after studies (difference-in-difference studies; studies with control for baseline conditions). The article confirms this theoretical problem by means of Monte Carlo simulations. This seems a very important point for all health service researchers to be aware of.

— Richard Lilford, CLAHRC WM Director

Reference:

  1. Daw JR & Hatfield LA. Matching and Regression to the Mean in Difference-in-Differences Analysis. Health Res Educ Trust. 2018.

Conflicts of Interest in Textbooks

An important thing to consider when reading research, especially something that shows new and promising results, is to check the conflicts of interests of the authors. For example, we will naturally be more sceptical about a study on breastfeeding if it is authored by researchers with connections to the formula industry, or a study showcasing the effectiveness of a new drug authored by researchers working for the pharmaceutical company. That is not to say that the studies are inherently biased, but that they should be viewed in a different light to studies that do not have such conflicts. As such, it is alarming to read a recent study by Piper, et al. that found that a considerable proportion of the authors of healthcare textbooks had undisclosed conflicts of interest,[1] stemming from patents on medical devices and remuneration from medical product companies. The textbooks chosen were all used in educating and training of physicians, pharmacists and dentists, and as references for treatments, etc. Perhaps textbook publishers need to follow the lead of academic journals and clearly state any conflicts. Until then, make sure to carefully consider what you read in textbooks.

— Peter Chilton, Research Fellow

Reference:

  1. Piper BJ, Lambert DA, Keefe RC, Smukler PU, Selemon NA, Duperry ZR. Undisclosed Conflicts of Interests among Biomedical Textbook Authors. AJOB Empir Bioeth. 2018. [ePub].

Biased Result from Machine Learning

In a recent blog I pointed out that machine learning could produce biased results.[1] This point has more recently been made in an insightful JAMA article by Verghese, et al.[2] They make the excellent point that machine prognosis may be very misleading if it does not capture all of the treatment variables that might be responsible for the outcome observed. This is analogous to the treatment paradox in clinical epidemiology, and is a great danger in ‘black box’ science. Using a machine to challenge a physician is one thing – doctors tend to give more optimistic prognoses than computer algorithms – but trying to supplant the physician is an altogether different matter. For those who want to cut the hubris and get real about the limited application of machine learning in clinical practice I recommend an excellent paper by Brynjolfsson and Mitchell.[3]

— Richard Lilford, CLAHRC WM Director

References:

  1. Lilford RJ. Machine Learning and the Demise of the Standard Clinical Trial! NIHR CLAHRC West Midlands News Blog. 10 November 2017.
  2. Verghese A, Shah NH, Harrington RA. What This Computer Needs is a Physician. JAMA. 2018; 319(1): 19-20.
  3. Brynjolfsson E & Mitchell T. What can machine learning do? Workforce implications. Science. 2017; 358: 1530-4.

The Underestimated Issue of Contingency in Study Design and Interpretation

Introduction
The central dogma of evidence-based care is that parameter estimates from clinical studies should inform clinical decisions. In its archetypal form the parameter estimates are obtained from head-to-head comparisons in randomised controlled trials (RCTs). Consider first clinical treatments, such as drugs, devices and talking therapies. Here, a target clinical population is defined in whom the treatment is hypothesised to have a beneficial effect – for instance the hypothesis that a left ventricular device for patients in Grade III heart failure may reduce the death rate within two years from 50% to 40% – a ten percentage point improvement. Such a treatment effect can be achieved with a sample of only 1,036 patients (false positive [alpha error] 5%; false negative [beta error] 10%). The same type of simple calculation can be performed across treatment types and outcomes – improve depression scores in people already depressed; pedagogic methods and examination scores in children sitting a particular examination.

Contingent effects
Consider now a diagnostic/screening test (hereafter called a ‘test’). Here again a population of interest would be described, pregnant women, say, or febrile patients. However, in this case the purpose of the population eligible for the test is to identify a further (sub) population – those eligible for treatment for the condition for which the person has tested positive. Here we wish to compare outcomes among a population given a test, but where the benefit is contingent on the treatment effect among those who test positive. This means that the intervention effect is greatly ‘diluted’ by all the people screening negative. Moreover, the dilution effect is not linear; for every halving of absolute effect size, the sample size needs to quadruple – other factors being equal. To put all this another way, the proportion of true positives among all tested, sets an upper limit for the benefit of a test. Take, for example, a test for postnatal depression, which occurs with sufficient severity to warrant treatment in, say, 10% of women. Consider a population of 10,000 pregnant women – 1,000 can expect to get postnatal depression. Standard screening methods can identify 60% of these women – 600 in our ‘population’. A new genetic test comes along that might identify a further 20%. In that case 80% of affected patients will be identified vs. 60% without the test. This amounts to 200 additional women in the original 10,000. Treatment can ‘cure’ depression in half of depressed women, so the incidence of depression in the screened population would drop by one percentage-point. These crude, indicative, figures are laid out below.

096 DCB - Underestimated cost Table 1

Indicative (not real) figures to calculate realistic outcomes for a population screening test.
A trial to detect a one percentage point difference in outcome would require 14,200 participants; whereas a trial to determine the effect of a potential new treatment for use in screen positive women that could ‘cure’ 70% vs. a 50% control rate would require a total of only about 248 participants, other things being equal (α = 0.05; β = 0.9 and no loss to follow-up).

The same general principle applies to generic service delivery interventions that operate through a causal chain with contingent effects, such as this:

096 DCB - Underestimated cost Fig 1

The mathematics of these cases have been worked out by our group elsewhere.[1-3]

In conclusion, it is important to model plausible effect sizes in advance to verify the plausibility of sample size calculations when the observed effect is contingent on upstream events in the causal chain. Causal thinking in clinical and service delivery research can help us identify realistic sample sizes for hypothesis tests.

— Richard Lilford, CLAHRC WM Director

Reference:

  1. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.
  2. Yao GL, Novielli N, Manaseki-Holland S, Chen YF, van der Klink M, Barach P, Chilton PJ, Lilford RJ, European HANDOVER Research Collaborative. Evaluation of a predevelopment service delivery intervention: an application to improve clinical handovers. BMJ Qual Saf. 2012; 21(s1):i29-i38.
  3. Watson SI & Lilford RJ. Essay 1: Integrating multiple sources of evidence: a Bayesian perspective. In: Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. Southampton (UK): NIHR Journals Library, 2016.