Tag Archives: RCTs

A Poorly Argued Article on the Results of Cluster RCTs in General Practice

A recent paper in the Journal of Clinical Epidemiology analysed the results of cluster RCTS where general practices were the unit of randomisation.[1] Effect sizes were reported for 72 outcomes across 29 cluster RCTs. Fifteen of the 72 outcomes were significant statistically, and only one met or exceeded the alternative hypothesis (delta). Disappointingly, the authors do not classify the trials properly, as we have recommended [2] – with or without baseline measurements and, if baseline measurements were used, whether the study was cross-sectional or cohort.[3] The authors seem to favour Bonferroni correction when there is more than one end-point, but this is unscientific. In situations where many study endpoints are part of a postulated causal chain, then far from ‘correcting’ for multiple observations, correspondence between different observed endpoints should reinforce a positive conclusion. Likewise, lack of correspondence should cast doubt on cause and effect conclusions. This process of triangulation between observations lies at the heart of causal thinking.[4] The logic is laid out in more detail elsewhere.[5] [6]

— Richard Lilford, CLAHRC WM Director


  1. Siebenhofer A, Paulitsch MA, Pregartner G, Berghold A, Jeitler K, Muth C, Engler J. Cluster-randomized controlled trials evaluating complex interventions in general practice are mostly ineffective: a systematic review. J Clin Epidemiol. 2018; 94: 85-96.
  2. Lamont T, Barber N, de Pury J, Fulop N, Garfield-Birkbeck S, Lilford R, Mear L, Raine R, Fitzpatrick R. New approaches to evaluating complex health and care systems. BMJ. 2016; 352: i154.
  3. Hemming K, Chilton PJ, Lilford RJ, Avery A, Sheikh A. Bayesian Cohort and Cross-Sectional Analyses of the PINCER Trial: A Pharmacist-Led Intervention to Reduce Medication Errors in Primary Care. PLOS ONE. 2012; 7(6): e38306.
  4. Lilford RJ. Beyond Logic Models. NIHR CLAHRC West Midlands News Blog. 2 September 2016.
  5. Watson SI, & Lilford RJ. Essay 1: Integrating multiple sources of evidence: a Bayesian perspective. In: Raine R, & Fitzpatrick R. (Eds). Challenges, solutions and future directions in the evaluation of service innovations in health care and public health. HS&DR Report No. 4.16. Southampton: NIHR Journals Library. 2016.
  6. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

Immunisation Against Rotavirus: At What Age Should it be Given?

A three way RCT [1] from Thailand shows that rotavirus vaccine is effective in reducing the incidence of diarrhoea in children (which we know), and that a neonatal schedule is no less effective and probably more effective than an infant schedule. Giving the vaccine early may reduce the risk of intussusception – apparently a risk with the infant schedule.

— Richard Lilford, CLAHRC WM Director


  1. Bines JE, At Thobari J, Satria CD, et al. Human Neonatal Rotavirus Vaccine (RV3-BB) to Target Rotavirus from Birth. New Engl J Med. 2018; 378(8): 719-30.

An Extremely Fascinating Debate in JAMA

You really should read this debate – Steven Goodman, a statistician for whom I have the utmost regard, wrote a brilliant paper in which he and colleagues show the importance of ‘design thinking’ in observational research.[1] The essence of their argument is that in designing and interpreting observational studies one should think about how the corresponding RCT would look. This way one can spot survivorship bias, which arises when the intervention group has been depleted of the most susceptible cases. This way of thinking encourages a comparison between new users of an intervention with new users of the comparator. Of course, it is not always possible to identify ‘new users’, but at least thinking in such a ‘design way’ can alert the reader to the danger of false inference.
One of the examples mentioned concerns hormone replacement therapy (HRT) where the largest RCT (Women’s Health Initiative trial) gave a very different result to the largest observational study (Nurses’ Health Study). The latter suggests a protective effect for HRT, while the former suggest the opposite. It looks as though this might not have been a very good example because, as Bhupathiraju and colleagues point out, there is a much simpler and more convincing explanation for the difference in the observed effects of HRT across the two studies.[2] The hormone replacement was given to much younger women in the observational study than in the trial. Subsequent meta-analysis of subgroups across all RCTs confirms that HRT is only protective in younger woman (who do not have established coronary artery disease). Thus, HRT is probably effective if started sufficiently early after the menopause.

This does not mean, of course, that Goodman and colleagues are wrong in principle; they may simply have selected a bad example. This is an extremely interesting exchange conducted politely between scholars and is interesting from both of the methodological and the substantive points of view.

— Richard Lilford, CLAHRC WM Director


  1. Goodman SN, Schneeweiss S, Baiocchi M. Using design thinking to differentiate useful from misleading evidence in observational research. JAMA. 2017; 317(7): 705-7.
  2. Bhupathiraju SN, Stampfer MJ, Manson JE. Posing Causal Questions When Analyzing Observational Data. JAMA. 2017; 318(2): 201.

An Argument with Michael Marmot

About two decades ago I went head-to-head in an argument with the great Michael Marmot at the Medical Research Council. The topic of conversation was information that should be routinely collected in randomised trials. Marmot was arguing that social class and economic information should be collected. He made a valid point that these things are correlated with outcomes. I pointed out that although they may be correlated with outcomes, they were not necessarily correlated with treatment effects. Then came Marmot’s killer argument. Marmot asked whether I thought that sex and ethnic group should be collected. When I admitted that they should be, he rounded on me, saying that this proves his point. We met only recently and he remembered the argument and stood by his point. However, it turns out that it is not really important to collect information on the sex after all. Wallach and colleagues, writing in the BMJ,[1] cite evidence from meta-analyses of RCTs to show that sex makes no difference to treatment effects when averaged across all studies. So there we have it, a parsimonious data set is optimal for trial purposes, since it increases the likelihood of collecting essential information to measure the parameter of interest.

— Richard Lilford, CLAHRC WM Director


  1. Wallach JD, Sullivan PG, Trepanowski JF, Steyerberg EW, Ioannidis JPA. Sex based subgroup differences in randomized controlled trials: empirical evidence from Cochrane meta-analyses. BMJ. 2016; 355: i5826.


An Extremely Interesting Three Way Experiment

News Blog readers know that the CLAHRC WM Director is always on the look-out for interesting randomised trials in health care and elsewhere. He has David Satterthwaite to thank for this one – an RCT carried out among applicants for low level jobs in five industries in Ethiopia.[1] The applicants (n=1,000), all of whom qualified for the job on paper, were randomised to three conditions:

  1. Control;
  2. Accepted into the industrial job;
  3. Given training in entrepreneurship and about $1,000 (at purchasing power parity).

Surprisingly, the industrial jobs, while producing more secure incomes, did not yield higher incomes than the control group and incomes were highest in the entrepreneur group. On intention-to-treat analysis the industrial jobs resulted in worse mental health than experienced in the entrepreneurial group, and physical health was also slightly worse. Many left the jobs in firms during the one year follow-up period. In qualitative interviews many said that they accepted industrial jobs only as a form of security while looking for other opportunities.

The authors, aware that rising minimum wages or increasing regulations have costs to society, are cautious in their conclusions. The paper is interesting nevertheless. The CLAHRC WM Director would like to do an RCT of paying a minimum wage vs. a slightly higher wage threshold to determine effects on productivity and wellbeing, positing an effect like this:


— Richard Lilford, CLAHRC WM Director


  1. Blattman C & Dercon S. Occupational Choice in Early Industrializing Societies: Experimental Evidence on the Income and Health Effects of Industrial and Entrepreneurial Work. SSRN. 2016.

History of Controlled Trials in Medicine

Rankin and Rivest recently published a piece looking at the use of clinical trials more than 400 years ago,[1] while Bothwell and Podolsky have produced a highly readable historical account of controlled trials.[2] Alternate treatment designs became quite popular in the late eighteenth century, but Austin Bradford Hill was concerned with the risk of ‘cheating’ and carried out an iconic RCT to overcome the problem.[3] But what next for the RCT? It is time to move to a Bayesian approach,[4] automate trials in medical record systems, and widen credible limits to include the risk of bias when follow-up is incomplete, therapist is not masked, or subjective outcomes are not effectively blinded.

— Richard Lilford, CLAHRC WM Director


  1. Rankin A & Rivest J. Medicine, Monopoly, and the Premodern State – Early Clincial Trials. N Engl J Med. 2016; 375(2): 106-9.
  2. Bothwell LE & Podolsky SH. The Emergence of the Randomized Controlled Trial. N Engl J Med. 2016; 375(6): 501-4.
  3. Hill AB. The environment and disease: Association or causation? Proc R Soc Med. 1965; 58(5): 295-300.
  4. Lilford RJ, & Edwards SJL. Why Underpowered Trials are Not Necessarily Unethical. Lancet. 1997; 350(9080): 804-7.

I Agree with Fiona

Dr Fiona Godlee, Editor-in-Chief of the BMJ, recently published a piece arguing that ‘data transparency is the only way‘.[1] This News Blog has featured a number of posts where many large RCTs have left a matter in contention – deworming children, clot busters for stroke, and vitamin A prophylaxis in children. When this happens, a dispute typically opens up about nuances in the handling of the data that might have introduced bias; bias so small that it is only material when the trials are large and hence the confidence limits narrow. The right policy is to stick the anonymised data in the public domain so that everyone can have a go at it. What is not okay, is to assume that one lot have the moral high ground – not industry, nor academics, nor editors, nor even CLAHRC Directors!

— Richard Lilford, CLAHRC WM Director


  1. Godlee F. Data transparency is the only way. BMJ. 2016; 352: i1261.

Another Study of Studies: Effectiveness using Routinely Collected Health Data vs. RCT Data

News Blog readers will be familiar with previous meta-epidemiological studies comparing effectiveness of the same treatment when evaluated using Routinely Collected Data (RCD) vs. prospective experiments (RCTs).[1] [2] Here is another such study from John Ioannidis, the world’s number one clinical epidemiologist – masterful.[3]
The RCD studies all:

  1. Preceded their complimentary RCTs
  2. Used prediction score modelling, and
  3. Had mortality as their main outcome.

Sixteen primary routine database studies were complimented by a mean of just over two subsequent RCTs examining the same clinical question. The findings here are not as sanguine regarding database studies as those cited in previous posts. The direction of effect was different in five (30%) of the 16 studies; confidence intervals in nine (59%) of the database studies did not include the observed effect in complimentary RCTs and, where they differed, the database studies tended to over-estimate treatment effects relative to the RCT estimate by a substantial 30%. This re-informs the perceived wisdom – experimental studies are the gold standard and database studies should not supplant them.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Yet Again RCTs Fail to Confirm Promising Effects of Dietary Factors in Observational Studies. CLAHRC WM News Blog. 25 September 2015.
  2. Lilford RJ. Very Different Results from RCT and Observational Studies? CLAHRC WM News Blog. 25 September 2015.
  3. Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JPA. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ. 2016; 352: i493.

Caution should be Exercised when Synthesising Evidence for Policy

Policy should be formulated from all the available evidence. For this reason systematic reviews and meta-analyses are undertaken. However, they are often not conclusive. Indeed, there have been notable articles published in the BMJ over the last two years which are critical of the evidence or conclusions of reviews that have been conducted to inform important contemporary public health decisions.

A key theme that often emerges from articles critical of reviews is that only evidence from randomised controlled trials (RCTs) is strong enough to support policy decisions. For example, Teicholz [1] claimed that a number of important RCTs were ignored by a recent report explaining changes in dietary guidance in the US. This claim has since been refuted by a large number of prominent researchers.[2] Kmietowicz [3] argued that there were flaws in a meta-analysis of observational patient data that supported the stockpiling of anti-flu medication for pandemic influenza, casting doubt on the decision to stockpile. An upcoming analysis of clinical trial data was instead alluded to, despite these trials only examining seasonal flu. Recently, McKee and Capewell,[4] and later Gornall,[5] criticised the evidence underpinning a comprehensive review from Public Health England [6] on the relative harms of e-cigarettes. They noted that it “included only two randomised controlled trials” and that there were methodological weaknesses and potential conflicts of interest in the other available evidence. McKee and Capewell make the claim that “the burden of proof that it is not harmful falls on those taking an action.” However, this is illogical because any policy choice, even doing nothing, can be considered an action and can cause harm. This claim therefore merely translates to saying that the policy chosen should be that best supported by the evidence of its overall effects.

Public health decisions should be made on the basis of all the currently available evidence. What then are reasons one might write off a piece of evidence entirely? One might object to the conclusions reached from the evidence on an ideological basis, or one might view the evidence as useless. In the latter case, this opinion could be reached by taking a rigid interpretation of the ‘hierarchy of evidence’. RCTs may be the only way of knowing for sure what the effects are, but this is not tantamount to concluding that other evidence should be rejected. RCTs are often, correctly in our view, regarded as an antidote to ideology. However, it is important not to let matters get out of hand so that RCTs themselves become the ideology.

In a recent paper, Walach and Loef,[7] argue that the hierarchy of evidence model, which places RCTs at the top of a hierarchy of study designs, is based on false assumptions. They argue that this model only represents degrees of internal validity. They go on to argue that as internal validity increases, external validity decreases. We don’t strictly agree: there is no necessary decoupling between internal and external validity. However we do agree that in many cases, by virtue of the study designs, RCTs may provide greater internal validity and other designs greater external validity. Then how could we know, in the case of a discrepancy between RCTs and observational studies, which results to rely on? The answer is that one would have to look outside the studies and piece together a story, i.e. a theory, and not ignore the observational evidence as recognised by Bradford-Hill’s famous criteria.

The case of chorion villous sampling, a test to detect foetal genetic abnormalities, serves as a good example of how different forms of evidence can provide different insights and be synthesised. Observational studies found evidence that chorion villous sampling increased the risk of transverse limb deformities, which was not detected in any of the RCTs at the time. To make sense of the evidence and to understand whether the findings from the observational evidence were a result of random variation in the population or perhaps poor study design, knowledge of developmental biology, teratology, and epidemiology were required. It turned out that the level of the transverse abnormality – fingers, hands, forearm, or upper arm – corresponded to the embryonic age at which the sampling was conducted and also to the development of the limb at that point. This finding enabled a cause and effect conclusion to be drawn that explained all the evidence and resulted in recommendations for safer practice.[8] [9]

Knowledge gained from the scientific process can inform us of the possible consequences of different policy choices. The desirability of these actions or their consequences can be then assessed in a normative or political framework. The challenge for the scientist is the understanding and synthesising of the available evidence independently of their ideological stance. There often remains great uncertainty about the consequences of different policies. In some cases, such as with electronic cigarettes, there may be reason to maintain the current policy if, by doing so, the likelihood of collecting further and better evidence is enhanced. However, in other cases, like stockpiling for pandemic influenza, such evidence depends on there being a pandemic and by then it is too late. Accepting only RCT evidence or adopting an ideological stance in reporting may distort what is reported to both key policy decision makers and individuals wishing to make an informed choice. It may even be potentially harmful.

— Richard Lilford, CLAHRC WM Director
— Sam Watson, Research Fellow


  1. Teicholz N. The scientific report guiding the US dietary guidelines: is it scientific? BMJ. 2015; 351: h4962.
  2. Centre for Science in the Public Interest. Letter Requesting BMJ to Retract “Investigation”. Nov 5 2015.
  3. Kmietowicz Z. Study claiming Tamiflu saved lives was based on “flawed” analysis. BMJ. 2014; 348: g2228.
  4. McKee M, Capewell S. Evidence about electronic cigarettes: a foundation built on rock or sand? BMJ. 2015; 351: h4863.
  5. Gornall J. Public Health England’s troubled trail. BMJ 2015;315:h5826
  6. McNeill A, Brose LS, Valder R, et al. E-cigarettes: an evidence update: a report commissioned by Public Health England. London: Public Health England, 2015.
  7. Walach H & Loef M. Using a matrix-analytical approach to synthesizing evidence solved incompatability problem in the hierarchy of evidence. J Clin Epidemiol.  2015; 68(11): 1251-1260
  8. Olney RS. Congenital limb reduction defects: clues from developmental biology, teratology and epidemiology. Paediatr Perinat Epidemiol. 1998; 12: 358–9.
  9. Mowatt G, Bower DJ, Brebner JA, et al. When and how to assess fast-changing technologies: a comparative study of medical applications of four generic technologies. Health Technol Assess. 1996; 1: 1–149.


Adverse Effects of Well-Intentioned Interventions to Improve Outcomes in Adolescence

We recently reported on the evaluation of a study to reduce aberrant teenage behaviour that had a negative effect – i.e. it actually increased the behaviour it was designed to prevent. On further enquiry, it turns out that this is but one of a series of studies targeting adolescent behaviour, that showed effects opposite to those intended.[1] The classic study, quoted by Dishion, McCord & Poulin, was the Cambridge-Sommerville Youth Study.[2] [3] This was a randomised study of matched pairs of adolescent boys (irrespective of previous behaviour) in a run-down neighbourhood. The intervention consisted of visits (an average of twice a month) by counsellors who also took the boys to sporting events, gave them driving lessons, and helped them and their family members apply for jobs. The intervention had harmful effects on arrests, alcohol problems, and mental hospital referral on follow-up 40 years later.[4] [5] In a sub-group comparison, boys sent to summer school more than once had a particularly bad outcome. This is consistent with the theory that mutual interaction reinforces behaviour problems among susceptible adolescent boys.

On the basis of this RCT and other randomised studies, one of which was cited in the previous post, “there is reason to be cautious and to avoid aggregating young high-risk adolescents into intervention groups.” Apparently interventions targeted at parents are more positive in their effects. CLAHRC WM has a large theme of work on adolescent health and the Director invites comments from inside and outside our organisation.

— Richard Lilford, CLAHRC WM Director


  1. Dishion TJ, McCord J, Poulin F. When Interventions Harm. Peer Groups and Problem Behaviour. Am Psychol. 1999; 54(9): 755-64.
  2. Healy W, & Bronner AF. New Light on Delinquency and its Treatment. New Haven, CT: Yale University Press. 1936.
  3. Powers E, & Witmer H. An Experiment in the Prevention of Delinquency: The Cambridge-Sommerville Youth Study. New York: Columbia University Press. 1951.
  4. McCord J. A Thirty-Year Follow-Up of Treatment Effects. Am Psychol. 1978; 33: 284-9.
  5. McCord J. Consideration of Some Effects of a Counseling Program. In: Martin SE, Sechrest LB, Redner R (Eds.) New Directions in the Rehabilitation of Criminal Offenders. Washington, D.C.; The National Academy of Sciences. 1981. p.394-405.