Tag Archives: RCTs

An Argument with Michael Marmot

About two decades ago I went head-to-head in an argument with the great Michael Marmot at the Medical Research Council. The topic of conversation was information that should be routinely collected in randomised trials. Marmot was arguing that social class and economic information should be collected. He made a valid point that these things are correlated with outcomes. I pointed out that although they may be correlated with outcomes, they were not necessarily correlated with treatment effects. Then came Marmot’s killer argument. Marmot asked whether I thought that sex and ethnic group should be collected. When I admitted that they should be, he rounded on me, saying that this proves his point. We met only recently and he remembered the argument and stood by his point. However, it turns out that it is not really important to collect information on the sex after all. Wallach and colleagues, writing in the BMJ,[1] cite evidence from meta-analyses of RCTs to show that sex makes no difference to treatment effects when averaged across all studies. So there we have it, a parsimonious data set is optimal for trial purposes, since it increases the likelihood of collecting essential information to measure the parameter of interest.

— Richard Lilford, CLAHRC WM Director


  1. Wallach JD, Sullivan PG, Trepanowski JF, Steyerberg EW, Ioannidis JPA. Sex based subgroup differences in randomized controlled trials: empirical evidence from Cochrane meta-analyses. BMJ. 2016; 355: i5826.


An Extremely Interesting Three Way Experiment

News Blog readers know that the CLAHRC WM Director is always on the look-out for interesting randomised trials in health care and elsewhere. He has David Satterthwaite to thank for this one – an RCT carried out among applicants for low level jobs in five industries in Ethiopia.[1] The applicants (n=1,000), all of whom qualified for the job on paper, were randomised to three conditions:

  1. Control;
  2. Accepted into the industrial job;
  3. Given training in entrepreneurship and about $1,000 (at purchasing power parity).

Surprisingly, the industrial jobs, while producing more secure incomes, did not yield higher incomes than the control group and incomes were highest in the entrepreneur group. On intention-to-treat analysis the industrial jobs resulted in worse mental health than experienced in the entrepreneurial group, and physical health was also slightly worse. Many left the jobs in firms during the one year follow-up period. In qualitative interviews many said that they accepted industrial jobs only as a form of security while looking for other opportunities.

The authors, aware that rising minimum wages or increasing regulations have costs to society, are cautious in their conclusions. The paper is interesting nevertheless. The CLAHRC WM Director would like to do an RCT of paying a minimum wage vs. a slightly higher wage threshold to determine effects on productivity and wellbeing, positing an effect like this:


— Richard Lilford, CLAHRC WM Director


  1. Blattman C & Dercon S. Occupational Choice in Early Industrializing Societies: Experimental Evidence on the Income and Health Effects of Industrial and Entrepreneurial Work. SSRN. 2016.

History of Controlled Trials in Medicine

Rankin and Rivest recently published a piece looking at the use of clinical trials more than 400 years ago,[1] while Bothwell and Podolsky have produced a highly readable historical account of controlled trials.[2] Alternate treatment designs became quite popular in the late eighteenth century, but Austin Bradford Hill was concerned with the risk of ‘cheating’ and carried out an iconic RCT to overcome the problem.[3] But what next for the RCT? It is time to move to a Bayesian approach,[4] automate trials in medical record systems, and widen credible limits to include the risk of bias when follow-up is incomplete, therapist is not masked, or subjective outcomes are not effectively blinded.

— Richard Lilford, CLAHRC WM Director


  1. Rankin A & Rivest J. Medicine, Monopoly, and the Premodern State – Early Clincial Trials. N Engl J Med. 2016; 375(2): 106-9.
  2. Bothwell LE & Podolsky SH. The Emergence of the Randomized Controlled Trial. N Engl J Med. 2016; 375(6): 501-4.
  3. Hill AB. The environment and disease: Association or causation? Proc R Soc Med. 1965; 58(5): 295-300.
  4. Lilford RJ, & Edwards SJL. Why Underpowered Trials are Not Necessarily Unethical. Lancet. 1997; 350(9080): 804-7.

I Agree with Fiona

Dr Fiona Godlee, Editor-in-Chief of the BMJ, recently published a piece arguing that ‘data transparency is the only way‘.[1] This News Blog has featured a number of posts where many large RCTs have left a matter in contention – deworming children, clot busters for stroke, and vitamin A prophylaxis in children. When this happens, a dispute typically opens up about nuances in the handling of the data that might have introduced bias; bias so small that it is only material when the trials are large and hence the confidence limits narrow. The right policy is to stick the anonymised data in the public domain so that everyone can have a go at it. What is not okay, is to assume that one lot have the moral high ground – not industry, nor academics, nor editors, nor even CLAHRC Directors!

— Richard Lilford, CLAHRC WM Director


  1. Godlee F. Data transparency is the only way. BMJ. 2016; 352: i1261.

Another Study of Studies: Effectiveness using Routinely Collected Health Data vs. RCT Data

News Blog readers will be familiar with previous meta-epidemiological studies comparing effectiveness of the same treatment when evaluated using Routinely Collected Data (RCD) vs. prospective experiments (RCTs).[1] [2] Here is another such study from John Ioannidis, the world’s number one clinical epidemiologist – masterful.[3]
The RCD studies all:

  1. Preceded their complimentary RCTs
  2. Used prediction score modelling, and
  3. Had mortality as their main outcome.

Sixteen primary routine database studies were complimented by a mean of just over two subsequent RCTs examining the same clinical question. The findings here are not as sanguine regarding database studies as those cited in previous posts. The direction of effect was different in five (30%) of the 16 studies; confidence intervals in nine (59%) of the database studies did not include the observed effect in complimentary RCTs and, where they differed, the database studies tended to over-estimate treatment effects relative to the RCT estimate by a substantial 30%. This re-informs the perceived wisdom – experimental studies are the gold standard and database studies should not supplant them.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Yet Again RCTs Fail to Confirm Promising Effects of Dietary Factors in Observational Studies. CLAHRC WM News Blog. 25 September 2015.
  2. Lilford RJ. Very Different Results from RCT and Observational Studies? CLAHRC WM News Blog. 25 September 2015.
  3. Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JPA. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ. 2016; 352: i493.

Caution should be Exercised when Synthesising Evidence for Policy

Policy should be formulated from all the available evidence. For this reason systematic reviews and meta-analyses are undertaken. However, they are often not conclusive. Indeed, there have been notable articles published in the BMJ over the last two years which are critical of the evidence or conclusions of reviews that have been conducted to inform important contemporary public health decisions.

A key theme that often emerges from articles critical of reviews is that only evidence from randomised controlled trials (RCTs) is strong enough to support policy decisions. For example, Teicholz [1] claimed that a number of important RCTs were ignored by a recent report explaining changes in dietary guidance in the US. This claim has since been refuted by a large number of prominent researchers.[2] Kmietowicz [3] argued that there were flaws in a meta-analysis of observational patient data that supported the stockpiling of anti-flu medication for pandemic influenza, casting doubt on the decision to stockpile. An upcoming analysis of clinical trial data was instead alluded to, despite these trials only examining seasonal flu. Recently, McKee and Capewell,[4] and later Gornall,[5] criticised the evidence underpinning a comprehensive review from Public Health England [6] on the relative harms of e-cigarettes. They noted that it “included only two randomised controlled trials” and that there were methodological weaknesses and potential conflicts of interest in the other available evidence. McKee and Capewell make the claim that “the burden of proof that it is not harmful falls on those taking an action.” However, this is illogical because any policy choice, even doing nothing, can be considered an action and can cause harm. This claim therefore merely translates to saying that the policy chosen should be that best supported by the evidence of its overall effects.

Public health decisions should be made on the basis of all the currently available evidence. What then are reasons one might write off a piece of evidence entirely? One might object to the conclusions reached from the evidence on an ideological basis, or one might view the evidence as useless. In the latter case, this opinion could be reached by taking a rigid interpretation of the ‘hierarchy of evidence’. RCTs may be the only way of knowing for sure what the effects are, but this is not tantamount to concluding that other evidence should be rejected. RCTs are often, correctly in our view, regarded as an antidote to ideology. However, it is important not to let matters get out of hand so that RCTs themselves become the ideology.

In a recent paper, Walach and Loef,[7] argue that the hierarchy of evidence model, which places RCTs at the top of a hierarchy of study designs, is based on false assumptions. They argue that this model only represents degrees of internal validity. They go on to argue that as internal validity increases, external validity decreases. We don’t strictly agree: there is no necessary decoupling between internal and external validity. However we do agree that in many cases, by virtue of the study designs, RCTs may provide greater internal validity and other designs greater external validity. Then how could we know, in the case of a discrepancy between RCTs and observational studies, which results to rely on? The answer is that one would have to look outside the studies and piece together a story, i.e. a theory, and not ignore the observational evidence as recognised by Bradford-Hill’s famous criteria.

The case of chorion villous sampling, a test to detect foetal genetic abnormalities, serves as a good example of how different forms of evidence can provide different insights and be synthesised. Observational studies found evidence that chorion villous sampling increased the risk of transverse limb deformities, which was not detected in any of the RCTs at the time. To make sense of the evidence and to understand whether the findings from the observational evidence were a result of random variation in the population or perhaps poor study design, knowledge of developmental biology, teratology, and epidemiology were required. It turned out that the level of the transverse abnormality – fingers, hands, forearm, or upper arm – corresponded to the embryonic age at which the sampling was conducted and also to the development of the limb at that point. This finding enabled a cause and effect conclusion to be drawn that explained all the evidence and resulted in recommendations for safer practice.[8] [9]

Knowledge gained from the scientific process can inform us of the possible consequences of different policy choices. The desirability of these actions or their consequences can be then assessed in a normative or political framework. The challenge for the scientist is the understanding and synthesising of the available evidence independently of their ideological stance. There often remains great uncertainty about the consequences of different policies. In some cases, such as with electronic cigarettes, there may be reason to maintain the current policy if, by doing so, the likelihood of collecting further and better evidence is enhanced. However, in other cases, like stockpiling for pandemic influenza, such evidence depends on there being a pandemic and by then it is too late. Accepting only RCT evidence or adopting an ideological stance in reporting may distort what is reported to both key policy decision makers and individuals wishing to make an informed choice. It may even be potentially harmful.

— Richard Lilford, CLAHRC WM Director
— Sam Watson, Research Fellow


  1. Teicholz N. The scientific report guiding the US dietary guidelines: is it scientific? BMJ. 2015; 351: h4962.
  2. Centre for Science in the Public Interest. Letter Requesting BMJ to Retract “Investigation”. Nov 5 2015.
  3. Kmietowicz Z. Study claiming Tamiflu saved lives was based on “flawed” analysis. BMJ. 2014; 348: g2228.
  4. McKee M, Capewell S. Evidence about electronic cigarettes: a foundation built on rock or sand? BMJ. 2015; 351: h4863.
  5. Gornall J. Public Health England’s troubled trail. BMJ 2015;315:h5826
  6. McNeill A, Brose LS, Valder R, et al. E-cigarettes: an evidence update: a report commissioned by Public Health England. London: Public Health England, 2015.
  7. Walach H & Loef M. Using a matrix-analytical approach to synthesizing evidence solved incompatability problem in the hierarchy of evidence. J Clin Epidemiol.  2015; 68(11): 1251-1260
  8. Olney RS. Congenital limb reduction defects: clues from developmental biology, teratology and epidemiology. Paediatr Perinat Epidemiol. 1998; 12: 358–9.
  9. Mowatt G, Bower DJ, Brebner JA, et al. When and how to assess fast-changing technologies: a comparative study of medical applications of four generic technologies. Health Technol Assess. 1996; 1: 1–149.


Adverse Effects of Well-Intentioned Interventions to Improve Outcomes in Adolescence

We recently reported on the evaluation of a study to reduce aberrant teenage behaviour that had a negative effect – i.e. it actually increased the behaviour it was designed to prevent. On further enquiry, it turns out that this is but one of a series of studies targeting adolescent behaviour, that showed effects opposite to those intended.[1] The classic study, quoted by Dishion, McCord & Poulin, was the Cambridge-Sommerville Youth Study.[2] [3] This was a randomised study of matched pairs of adolescent boys (irrespective of previous behaviour) in a run-down neighbourhood. The intervention consisted of visits (an average of twice a month) by counsellors who also took the boys to sporting events, gave them driving lessons, and helped them and their family members apply for jobs. The intervention had harmful effects on arrests, alcohol problems, and mental hospital referral on follow-up 40 years later.[4] [5] In a sub-group comparison, boys sent to summer school more than once had a particularly bad outcome. This is consistent with the theory that mutual interaction reinforces behaviour problems among susceptible adolescent boys.

On the basis of this RCT and other randomised studies, one of which was cited in the previous post, “there is reason to be cautious and to avoid aggregating young high-risk adolescents into intervention groups.” Apparently interventions targeted at parents are more positive in their effects. CLAHRC WM has a large theme of work on adolescent health and the Director invites comments from inside and outside our organisation.

— Richard Lilford, CLAHRC WM Director


  1. Dishion TJ, McCord J, Poulin F. When Interventions Harm. Peer Groups and Problem Behaviour. Am Psychol. 1999; 54(9): 755-64.
  2. Healy W, & Bronner AF. New Light on Delinquency and its Treatment. New Haven, CT: Yale University Press. 1936.
  3. Powers E, & Witmer H. An Experiment in the Prevention of Delinquency: The Cambridge-Sommerville Youth Study. New York: Columbia University Press. 1951.
  4. McCord J. A Thirty-Year Follow-Up of Treatment Effects. Am Psychol. 1978; 33: 284-9.
  5. McCord J. Consideration of Some Effects of a Counseling Program. In: Martin SE, Sechrest LB, Redner R (Eds.) New Directions in the Rehabilitation of Criminal Offenders. Washington, D.C.; The National Academy of Sciences. 1981. p.394-405.


Guidelines are built in a solipsistic way around individual diseases and do not take into account the fact that most patients have more than one disease? Well this is not quite true; statins trials have been tested over a range and mix of conditions. Nevertheless, most trials are based on a rather ‘clean’ population. This creates three theoretical problems:

  1. Drugs prescribed for one disease may interact with those prescribed for another. For example, non-steroidals agents administered for osteoarthritis may interact with warfarin given for atrial fibrillation.
  2. Drugs prescribed for one disease may interact directly with another disease. Beta-blockers prescribed for hypertension may aggravate asthma, for example.
  3. Drugs may be less effective in treating the target disease when lots of diseases are present.

The first problem is well known – polypharmacy, apart from being bothersome, can be dangerous, as in the example cited. However, e-prescribing can reduce this risk, not just theoretically, but empirically.[1]

The second problem is also well documented, and it too can be tackled by e-prescribing, given a sufficiently data-rich IT set.[2]

The third problem, that multi-morbidity may affect the effectiveness of treatment for the index condition, has received less attention. There are theoretical reasons why multi-morbidity may attenuate effectiveness – for example, by reducing adherence to treatment. There are also theoretical arguments for an augmented effect when diseases share a common pathway targeted by the drug – inflammatory mechanisms for instance. This issue has now been investigated empirically in a recent paper [3] and editorial [4] in the BMJ. Tinetti and colleagues studied nine drugs shown in RCTs to decrease risk of death in people with cardiovascular disease. They used a database of 8,578 Americans who all had at least one condition in addition to the index cardiovascular disease. The hypothesis that effectiveness is affected by the presence of multi-morbidity was testable because many (nearly half) of the patients in the database had not received the drug that had been shown to be effective for their condition in RCTs. So the outcomes of those where the drug was indicated and prescribed could be compared to those where it was indicated but not prescribed. The results anticipated from the RCTs were replicated among the database patients – in other words, multi-morbidity did not modify treatment effectiveness in terms of adjusted hazard ratios.

Of course, such an observational study is beset with selection biases, including ‘healthy user bias’, whereby those who take medicines have a better prognosis a priori, and ‘immortal time bias,’ whereby some of those who would have received the intervention in an RCT are just not there to be counted in the database because they have died. In other words, receiving or not receiving the indicated drug may not be a good instrumental variable. Nevertheless, the results were adjusted for as many confounders as possible and this provides a measure of assurance that drugs produce anticipated effects, notwithstanding multi-morbidity. CLAHRC WM has an active theme of work in tailoring care according to patient preference, and this study is certainly highly relevant to this project. The CLAHRC WM Director makes the further point that if the relative risk reduction is the same in patients with multi-morbidity and those with single diseases, and if the multi-morbidity is associated with higher baseline morbidity/mortality, then the absolute risk reduction will be higher in multi-morbid than in uni-morbid patients

— Richard Lilford, CLAHRC WM Director


  1. Nuckols TK, Smith-Spangler C, Morton SC, et al. The effectiveness of computerized order entry at reducing preventable adverse drug events and medication errors in hospital settings: a systematic review and meta-analysis. Syst Rev. 2014; 3: 56.
  2. Avery AJ, Rodgers S, Cantrill JA, et al. A pharmacist-led information technology intervention for medication errors (PINCER): a multicentre, cluster randomised, controlled trial and cost-effectiveness analysis. Lancet. 2012; 379: 1310-9.
  3. Tinetti ME, McAvay G, Trentalange M, Cohen AB, Allore HG. Association between guideline recommended drugs and death in older adults with multiple chronic conditions: population based cohort study. BMJ. 2015; 351: h4984.
  4. Muth C & Glasziou PP. Guideline recommended treatments in patients with multimorbidity. BMJ. 2015; 351: h5145.

Very Different Results from RCT and Observational Studies?

The CLAHRC WM Director lives only a few doors down from the Edgbaston golf course. The club house is a fine Georgian building and was the home of William Withering – a member of the Lunar Society and the person who discovered the cardiac-stimulating drug digitalis (digoxin). Foxglove plants, the natural source of digitalis, still grow in profusion. The CLAHRC WM Director prescribed this medicine frequently when working as a junior doctor on the medical wards. However, the drug fell from grace when many observational studies showed that use of the medicine was associated with an increased risk of death in patients with heart failure. However, a recent meta-regression [1] from Birmingham, London and Melbourne showed that the more care that was taken to reduce the risk of bias, the smaller the estimated increase in mortality, ending up with RCTs showing a neutral effect. The source of bias is obvious – doctors prescribe the medicine for their sickest patients, and the observable prognostic factors pick up only a proportion of the increased risk. The residual prognostic factors are subtle clues that experienced doctors can sense in a tacit way.

The accompanying editorial [2] uses these data to rubbish observational evidence, rather spectacularly missing a more subtle point – the greater the care taken in observational evidence, the more the risk of bias can be mitigated. Further mitigation is possible by adjusting for bias, using the method of Turner et al.,[3] thereby reducing point estimates and widening confidence limits into credible limits. We are considering using digoxin as an examplar of the method.

As for William Withering’s medicine; well it appears not to increase death rates after all, while both the observational and RCT evidence suggests that it reduces the need for admission. Two hundred and thirty years after his discovery, the scientific principles of the Englightenment that Withering espoused continue to refine our understanding of the medical uses of digoxin.

— Richard Lilford, CLAHRC WM Director


  1. Ziff OJ, Lane DA, Samra M, et al. Safety and efficacy of digoxin: systematic review and meta-analysis of observational and controlled trial data. BMJ. 2015; 351: h4451.
  2. Cole GD & Francis DP. Trials are best, ignore the rest: safety and efficacy of digoxin. BMJ. 2015; 351: h4662.
  3. Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc. 2009;172(1):21-47.

Yet Again RCTs Fail to Confirm Promising Effects of Dietary Factors in Observational Studies

This time the dietary factors were Omega-3 fatty acids and various trace elements, and the condition of interest was cognitive decline with age.[1] The interventions were tested in a factorial RCT and the mean age of patients was 71. To be sure, cognitive function declined on average with the passing of the years, but did so equally, irrespective of whether patients had omega-3, other nutritional supplements, both, or neither. Loss to follow-up was low, the study was large (n=3,741), and follow-up averaged five years. It is possible, at least in theory, that any effect of nutrition on cognition unfolds over decades not years. It is also possible that foods rich in the various nutrients tested contain other factors that, singly or in combination, are responsible for the beneficial effects seen in observational studies. How will we ever know? Here is a thought – RCTs of nutritional factors should be thought of as testing the hypothesis that they are harmful – what some call equivalence studies. Then, so long as the nutritional factor of interest is not harmful and is good to eat, it should be included in a balanced diet.

— Richard Lilford, CLAHRC WM Director


  1. Chew EY, Clemons TE, Agrón E, et al. Effect of Omega-3 Fatty Acids, Lutein/Zeaxanthin, or Other Nutrient Supplementation on Cognitive Function. The AREDS2 Randomized Clinical Trial. JAMA. 2015; 314(8):791-801.