Tag Archives: Trials

A Poorly Argued Article on the Results of Cluster RCTs in General Practice

A recent paper in the Journal of Clinical Epidemiology analysed the results of cluster RCTS where general practices were the unit of randomisation.[1] Effect sizes were reported for 72 outcomes across 29 cluster RCTs. Fifteen of the 72 outcomes were significant statistically, and only one met or exceeded the alternative hypothesis (delta). Disappointingly, the authors do not classify the trials properly, as we have recommended [2] – with or without baseline measurements and, if baseline measurements were used, whether the study was cross-sectional or cohort.[3] The authors seem to favour Bonferroni correction when there is more than one end-point, but this is unscientific. In situations where many study endpoints are part of a postulated causal chain, then far from ‘correcting’ for multiple observations, correspondence between different observed endpoints should reinforce a positive conclusion. Likewise, lack of correspondence should cast doubt on cause and effect conclusions. This process of triangulation between observations lies at the heart of causal thinking.[4] The logic is laid out in more detail elsewhere.[5] [6]

— Richard Lilford, CLAHRC WM Director


  1. Siebenhofer A, Paulitsch MA, Pregartner G, Berghold A, Jeitler K, Muth C, Engler J. Cluster-randomized controlled trials evaluating complex interventions in general practice are mostly ineffective: a systematic review. J Clin Epidemiol. 2018; 94: 85-96.
  2. Lamont T, Barber N, de Pury J, Fulop N, Garfield-Birkbeck S, Lilford R, Mear L, Raine R, Fitzpatrick R. New approaches to evaluating complex health and care systems. BMJ. 2016; 352: i154.
  3. Hemming K, Chilton PJ, Lilford RJ, Avery A, Sheikh A. Bayesian Cohort and Cross-Sectional Analyses of the PINCER Trial: A Pharmacist-Led Intervention to Reduce Medication Errors in Primary Care. PLOS ONE. 2012; 7(6): e38306.
  4. Lilford RJ. Beyond Logic Models. NIHR CLAHRC West Midlands News Blog. 2 September 2016.
  5. Watson SI, & Lilford RJ. Essay 1: Integrating multiple sources of evidence: a Bayesian perspective. In: Raine R, & Fitzpatrick R. (Eds). Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. HS&DR Report No. 4.16. Southampton: NIHR Journals Library. 2016.
  6. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

Why the CLAHRC WM Director Loves Orthopaedic Trials

I love them because:

  1. The outcomes are measured on a continuous functional scale, so that one-third of a standard deviation can be detected with a trial of about 500 patients, rather than the 5,000 needed for many mortality trials.
  2. Outcomes are usually short-term; we do not need to wait for recurrence or death, for example.
  3. It is possible to rapidly determine the effect of trial results on clinical practice through hospital databases, and confirm through orthopaedic registries.

CLAHRC WM is collaborating with CLAHRC East Midlands in evaluating all outputs from the NIHR HTA programme, and orthopaedic trials are proving particularly informative. We give examples in the table below:


Rangan, et al. Surgical versus non-surgical treatment for proximal fracture of the humerus. 2015. Robust, clinically relevant evidence shows that surgical intervention does not result in a better outcome for patients with a displaced fracture of the proximal humerus involving the surgical neck than non-surgical treatment. Surgery is currently the most widely used treatment. It is neither effective nor cost-effective.
Costa, et al. RCT Kirschner wires versus plate fixation for displaced distal radius fractures. 2015. Trial contradicts both the increasing trend towards the use of locking plates in the treatment of distal radius fractures, and the findings of previous trials, which indicated that locking plates provide improved functional outcomes compared with K-wire fixation.

— Richard Lilford, CLAHRC WM Director

The Beneficial Effects of Taking Part in International Research: an Old Chestnut Revisited

Two recent and well-written articles grapple with this question of whether or not clinical trials are beneficial, net of any benefit conferred by the therapeutic modalities evaluated in those trials.[1] [2]

The first study from the Netherlands concerns the effect of taking part in clinical trials where controls are made up of people not participating in trials (presumably because they were not offered entry in the trial).[1] This is the topic of a rather extensive literature, including a study to which I contributed.[3] The latter study found that the putative ‘trial effect’ applied only in circumstances where care given to control patients was not protocol-directed. In other words, our results suggested that the ‘trial effect’ was really a ‘protocol effect’. In that case the effect should be ephemeral and disappear as greater proportions of care become protocolised. And that is what appears to have happened – Lin, et al.[1] report no benefit to trial participants versus non-trial patients for the highly protocolised disease Hodgkin lymphoma. They speculate that while participation in trials does not affect individual patient care in the short-term, hosting trials does sensitise clinicians at an institutional level, so that they are more likely than clinicians from non-participating hospitals to practice evidence-based care. However, they offer no direct evidence for this assertion. Such evidence is, however, provided by the next study.

The effects of high participation rates in clinical trials at the hospital level is evaluated in an elegant study recently published in the prestigious journal ‘Gut’.[2] The team of authors (that includes prominent civil servants and many distinguished cancer specialists and statisticians) compared outcomes from colon cancer according to the extent to which the hospital providing treatment participated in trials. This ingenious study was accomplished by linking the NIHR’s data on clinical trials participation to cancer registry data and Hospital Episode Statistics. It turned out that risk-adjusted survival was significantly better in the high participation hospitals than in lower participation hospitals, even after substantial risk-adjustment. “Residual confounding” do I hear you say? Perhaps, but the authors have two further lines of evidence for the causal explanation. First, they documented a dose-response; the greater the level of participation, the greater the improvement in survival. Of course, an unknown confounder that was correlated with participation rates would produce just such a finding. The second line of evidence is more impressive – the longer the duration over which a hospital had sustained high participation rates, the greater the effect. Again, of course, this argument is not impregnable – duration might not serve as a good Instrumental Variable. How might the case be further strengthened (or refuted)? By unravelling the theoretical pathway between explanatory and outcome variables.[4] Since this is a database study, the process variables that might mediate the putative effect were not available to the authors. However, separate studies have indeed found an association between improved processes of care and trial participation.[5] Taken in the round, I think that a cause/effect explanation holds (>90% of my probability density favours the causal explanation).

— Richard Lilford, CLAHRC WM Director


  1. Liu L, Giusti F, Schaapveld M, et al. Survival differences between patients with Hodgkin lymphoma treated inside and outside clinical trials. A study based on the EORTC-Netherlands Cancer Registry linked data with 20 years of follow-up. Br J Haematol. 2017; 176: 65-75.
  2. Downing A, Morris EJA, Corrigan N, et al. High hospital research participation and improved colorectal cancer survival outcomes: a population-based study. Gut. 2017; 66: 89-96.
  3. Braunholtz DA, Edwards SJ, Lilford RJ. Are randomized clinical trials good for us (in the short term)? Evidence for a “trial effect”. J Clin Epidemiol. 2001; 54(3): 217-24.
  4. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end pointsBMJ. 2010; 341: c4413.
  5. Selby P. The impact of the process of clinical research on health service outcomes. Ann Oncol. 2011; 22(s7): vii2-4.

An Intriguing Suggestion to Link Trial Data to Routine Data

When extrapolating from trial data to a particular context, it is important to compare the trial population to the target population. Given sufficient data, it is possible to examine treatment effect across important subgroups of patients. Then the trial results can be related to a specific sub-group, say with less severe disease than the average in the trial. One problem is that trial data are collected with greater diligence than routine data. Hence a suggestion to link trial data to routine data collected on the same patients. That way one can compare subgroups of trial and non-trial patients recorded in a broadly similar (i.e. routine) way.[1] This strikes me as a half-way house to the day when (most) trial data are collected by routine systems, and trials are essentially nested within routine data-collection systems.

— Richard Lilford, CLAHRC WM Director


  1. Najafzadeh M, Schneeweiss S. From Trial to Target Populations – Calibrating Real-World Data. N Engl J Med. 2017; 376: 1203-4.

An Article We All Had Better Read

Oh dear – the CLAHRC WM Director would so like to think that disease-specific mortality is the appropriate outcome for cancer screening trials, rather than all-cause mortality. But Black and colleagues have published a very sobering article.[1] They found 12 trials of cancer screening (yes, only 12) where both cancer-specific mortality and all-cause mortality are reported. The effect size (in relative risk terms) is bigger for cancer-specific than for all-cause mortality in seven trials, about the same in four, and the other way in one. This suggests that the benefit is greater, even relatively, for cancer-specific than for all deaths. There are two explanations for this – one that the CLAHRC WM Director had thought of, and the other that was new to him.

  1. Investigation and treatment of false positives (including cancers that would never had presented) may increase risk of death as a result of iatrogenesis and heightened anxiety. There is some evidence for this.
  2. According to the ‘sticky diagnosis theory’, once a diagnostic label has been assigned, then a subsequent death is systematically more likely to be attributed to that diagnosis than if that diagnosis had not been made. There is some evidence for this hypothesis too.

And here is the thing – in screening trials a very small proportion of people in either arm of the study die from the index disease. The corollary is that a small mortality increase among the majority not destined to die has a relatively large effect.

So we have done many expensive trials, and implemented large, expensive screening programmes, yet our effects might have been nugatory. And there is a reason why so few trials have all-cause mortality outcomes – the trials have to be long and potential effects on this end-point are small and liable to be lost in the noise. Somewhere there is a ‘horizon of science’ where precision is hard to find, and where tiny biases can swamp treatment effects. At the risk of sounding nihilistic, the CLAHRC WM Director wonders whether cancer screening is such a topic.

— Richard Lilford, CLAHRC WM Director


  1. Black WC, Haggstrom DA, Welch HG. All-Cause Mortality in Randomized Trials of Cancer Screening. J Nat Cancer Instit. 2002; 94(3): 167-73.

Trial of Two Methods of Out-of-Hospital Resuscitation for Cardiac Arrest with an Interesting Design

Cluster trials very seldom use a cross-over design for the reason that it is typically tricky to withdraw a cluster level intervention once it has been introduced. However, as in clinical trials, the cross-over design is very powerful statistically (yields precise estimates) in those situations where it is feasible. Such was the case in a cluster trial of methods for cardio-pulmonary resuscitation.[1] One hundred and fourteen clusters (emergency medical services) participated. Adults with non-trauma related cardiac arrest were managed (according to cluster and phase) with either:

  1. Continuous chest compressions with asynchronous ventilations ten times per minute (experimental method); or
  2. Compressions interrupted to provide ventilation at a ratio of 30 compressions to two ventilations (standard method).

Nearly 24,000 people with cardiac arrest were included in the study and the survival rate with continuous compressions was slightly lower (at 9.0%) than with the standard interrupted method (at 9.7%). The result was not quite significant on standard statistical analysis. The CLAHRC WM Director thought the interrupted method would seem to be the one to go for, but the accompanying editorial was equivocal [2] – it would appear that even a trial of 24,000 participants, albeit in clusters, was not enough to resolve the issue. However, the trial methodology is certainly interesting.

— Richard Lilford, CLAHRC WM Director


  1. Nichol G, Leroux B, Wang H, et al. Trial of Continuous or Interrupted Chest Compressions during CPR. New Engl J Med. 2015; 373(25): 2203-14.
  2. Koster RW. Continuous or Interrupted Chest Compressions for Cardiac Arrest. New Engl J Med. 2015; 373(25): 2278-9.

More on Ultra-Trials

The evaluation of specific treatments in specific diseases are generally investigated in trials of modest size – high hundreds or low thousands. The outcomes of interest, when binary, typically have baseline (control) rates of 5% to 10%. Worthwhile (say 20% risk ratio) improvements can be detected with reasonable precision by trials of this size. When we move to screening, vaccinations, and mass treatment programs, however, things become more difficult, and mega (10,000–100,000) or even ultra (>100,000) trials are necessary. The vitamin A trials in neonates discussed above collectively enrolled 100,038 participants, while the current cohort of vitamin D trials in adults is expected to enroll in excess of 100,000 participants,[1] and the UK Collaborative Trial of Ovarian Cancer Screening has just over 200,000 participants.[2] Given the shape of the graph relating marginal gains in precision to marginal increases in participants (the power function [3]) we may be reaching the ‘horizon of science’ in these topics.

— Richard Lilford, CLAHRC WM Director


  1. Manson JAE, & Bassuk SS. Vitamin D Research and Clinical Practice. At a Crossroads. JAMA. 2015; 313: 1311-2.
  2. United Kingdom Collaborative Trial of Ovarian Cancer Screening Overview. [Online]. 2015.
  3. Watson S. Mega- and Ultra-Trials. [Online]. 2015.

Mega- and Ultra-Trials

Randomised controlled trials (RCTs) are getting larger. Increased sample sizes enable researchers to achieve greater statistical precision and detect increasingly smaller effect sizes against diminishing baseline rates of the primary outcomes of interest. Thus over time, we are seeing an increase in the sample sizes of RCTs, leading to what may be termed mega-trials (>1,000 participants) and even ultra-trials (>10,000 participants). The below figure shows the minimum detectable effect size (in terms of difference from baseline) for a trial versus its sample size (with α = 0.05, β = 0.8 and a control group baseline of 0.1, i.e. 10%), along with a small selection of non-cluster RCTs from the New England Journal of Medicine published in the last three years. What this figure illustrates is that there is diminishing returns, in terms of statistical power, from larger sample sizes.

Figure 1. Minimum detectable effect size vs. sample size
Figure 1. Minimum detectable effect size vs. sample size

Nevertheless, with great statistical power comes great responsibility. Assuming that the sample size is large enough that observation of a p-value greater than 0.05 is evidence that no (clinically significant) effect exists may lead to perhaps erroneous conclusions. For example, Fox et al. (2014) enrolled 19,102 participants to examine whether ivabradine improved clinical outcomes of patients with stable coronary artery disease.[1] The estimated hazard ratio for death from cardiovascular causes or acute myocardial infarction with the treatment was 1.08 but with a p-value of 0.2, and so it was concluded that ivabradine did not improve outcomes. However, we might see this as evidence that ivabradine worsens outcomes. A crude calculation suggests that the minimum detectable hazard ratio in this study was 1.14, and, for a sample of this size, the results suggest that almost 50 more patients died (against a baseline of 6.3%) in the treatment group. One might therefore actually see this as clinically significant.

Similarly, Roe et al. (2012) enrolled 7,243 patients to compare prasugrel and clopidogrel for acute coronary syndromes without revascularisation.[2] The hazard ratio for death with prasugrel was 0.91 with a p-value of 0.21. The authors concluded that prasugrel did not “significantly” reduce the risk of death. Yet, with the death rate in the clopidogrel group at 16%, a hazard ratio of 0.91, with a sample size this large, represents approximately 50 fewer deaths in the prasugrel group. Again, some may argue that this is clinically significant. Importantly, a quick calculation reveals that the minimum detectable effect size in this study was 0.89.

Many authors have warned against using p-values to decide on whether an intervention has an effect or not. Mega- and ultra-trials do not reduce the folly of using p-values in this way and may even exacerbate the problem by providing a false sense of confidence.

— Samuel Watson, Research Fellow


  1. Fox K, Ford I, Steg PG, Tardif J-C, Tendera M, Ferrari R. Ivabradine in Stable Coronary Artery Disease without Clinical Heart Failure. New Engl J Med. 2014; 371(12): 1091-99.
  2. Roe MT, Armstrong PW, Fox KAA, et al. Prasugrel versus Clopidogrel for Acute Coronary Syndromes without Revascularization. New Engl J Med. 2012; 367: 1297-1309.

International comparison of trial results

Nearly two decades ago colleagues from the Cochrane Complementary and Alternative Medicine (CAM) Field found that a high proportion of trials originating from Eastern Asia or Eastern Europe tended to report positive results – nearly 100% in some cases.[1] This was not only for acupuncture trials but also for trials in other topics. More recently, a team led by Professor John Ioannidis, a renowned epidemiologist, conducted a meta-epidemiological study (a methodological study that examines and analyses data obtained from many systematic reviews/meta-analyses) and showed that treatment effects reported in trials conducted in less developed countries are generally larger compared with those reported in trials undertaken in more developed countries.[2] Many factors could have contributed to these observations, for example, publication bias, reporting bias, rigour of scientific conduct, difference in patient populations and disease characteristics, and genuine difference in intervention efficacy. While it is almost certain that the observation was not attributed to genuine difference in intervention efficacy alone, teasing out the influence of various factors is not an easy task. Lately, colleagues from CLAHRC WM have compared results from cardiovascular trials conducted in Europe with those conducted in North America, and did not find a convincing difference between them.[3] Perhaps the more interesting findings will come from the comparison between trials from Europe/America and those from Asia. The results? The paper is currently in press, so watch this space!

— Yen-Fu Chen, Senior Research Fellow


  1. Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Controlled Clin Trials. 1998; 19(2): 159-66.
  2. Panagiotou OA, Contopoulos-Ioannidis DG, Ioannidis, JPA, Rehnborg CF. Comparative effect sizes in randomised trials from less developed and more developed countries: meta-epidemiological assessment. BMJ. 2013; 346: f707.
  3. Bowater RJ, Hartley LC, Lilford RJ. Are cardiovascular trial results systematically different between North America and Europe? A study based on intra-meta-analysis comparisons. Arch Cardiovasc Dis. 2015; 108(1):23-38.