Tag Archives: Computer

More on why AI Cannot Displace Your Doctor Anytime Soon

News blog readers will be familiar with my profound scepticism about the role of artificial intelligence (AI) in medicine.[1] I have consistently made the point that there is no clear outcome to much medical process. This is quite different to a game of Go where, in the end, you either win or lose. Moreover, AI can simply replicate human error by replicating faulty parts of human processes. I previously used the example of racial bias in police work as an example.[2] Also, when you take a history, then the questions you ask are informed by medical logic or intuition. And eliciting the correct answer is partly a matter of good empathetic approach, as pointed out beautifully in a recent article by Alastair Denniston and colleagues.[3] So comparing AI with a physician is really comparing a physician with physician plus AI.

A further important article on the limitations of AI that has recently come out in the journal Science.[4] The article explains how AI can outperform human operators on a game of Space Invaders; but if the game is suddenly altered until all but one alien is removed, the AI performance deteriorates. A human player can immediately spot the problem, whereas the AI system is flummoxed for many iterations. The article explains how AI is coming full circle. First, computer scientists tried to mimic expert performance at a task. Then, AI completely bypassed the expert by means of a self-learning neural network. They declared victory when ‘AlphaGo’ beat Go champion Ke Jie. That was the high water mark for AI, and although a few enthusiasts declared victory,[5] serious AI scientists have turned back to human intelligence to inform their algorithms. They are even starting to study how children learn and using this knowledge in AI systems.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Update on AI. NIHR CLAHRC West Midlands News Blog. 1 June 2018.
  2. Lilford RJ. How Accurate Are Computer Algorithms Really? NIHR CLAHRC West Midlands News Blog. 26 January 2018.
  3. Liu X, Keane PA, Denniston AK. Time to regenerate: the doctor in the age of artificial intelligence. J Roy Soc Med. 2018; 111(4): 113-6.
  4. Hutson M. How researchers are teaching AI to learn like a child. Science. 24 May 2018.
  5. Lilford RJ. Computer Beats Champion Player at Go – What Does This Mean for Medical Diagnosis? NIHR CLAHRC West Midlands News Blog. 8 April 2016.

Update on AI

A recent article in Science [1] pointed out that scientists have to tweak their AI systems to get them to give the correct answer. But I have a different problem with AI – how do you know that the supposed right answer is actually right? In a game of Go this issue does not arise. You either win or you lose. But medicine is not like that. The machine may diagnose thyroid cancer. You take a biopsy and find thyroid cancer. But is this not the same thing as cases of thyroid cancer found in clinical practice – the machine may be unmasking cases that would never have come to light.[2] In a previous blog we pointed out that machine learning can replicate human bias – for instance, if police are more likely to charge black male youths than equally offending elderly white women, then the machine will learn precisely the wrong lesson, as pointed out in a previous News Blog.[3]

— Richard Lilford, CLAHRC WM Director


  1. Hutson M. Has artificial intelligence become alchemy? Science. 2018; 360: 478.
  2. Lilford RJ. Thyroid Cancer: Another Indolent Tumour Prone to Massive Over Diagnosis. NIHR CLAHRC West Midlands News Blog. 24 March 2017.
  3. Lilford RJ. How Accurate are Computer Algorithms Really? NIHR CLAHRC West Midlands News Blog. 26 January 2018.

Machine Learning and the Demise of the Standard Clinical Trial!

An increasing proportion of evaluations are based on database studies. There are many good reasons for this. First, there simply is not enough capacity to do randomised comparisons of all possible treatment variables.[1] Second, some treatment variables, such as ovarian removal during hysterectomy, are directed by patient choice rather than experimental imperative.[2] Third, certain outcomes, especially those contingent on diagnostic tests,[3] are simply too rare to evaluate by randomised trial methodology. In such cases, it is appropriate to turn to database studies. And when conducting database studies it is becoming increasingly common to use machine learning rather than standard statistical methods, such as logistic regression. This article is concerned with strengths and limitations of machine learning when databases are used to look for evidence of effectiveness.

When conducting database studies, it is right and proper to adjust for confounders and look for interaction effects. However, there is always a risk that unknown or unmeasured confounders will result in residual selection bias. Note that two types of selection are in play:

  1. Selection into the study.
  2. Once in the study, selection into one arm of the study or another.

Here we argue that while machine learning has advantages over RCTs with respect to the former type of bias, it cannot (completely) solve the problem of selection to one type of treatment vs. another.

Selection into a Study and Induction Across Place and Time (External Validity)
A machine learning system based on accumulating data across a health system has advantages with respect to the representativeness of the sample and generalisations across time and space.

First, there are no exclusions by potential participant or clinician choice that can make the sample non-representative of the population as a whole. It is true that the selection is limited to people who have reached the point where their data become available (it cannot include people who did not seek care, for example), but this caveat aside, the problem of selection into the study is strongly mitigated. (There is also the problem of ‘survivor bias’, where people are ‘missing’ from the control group because they have died, become ineligible or withdrawn from care. We shall return to this issue.)
Second, the machine can track (any) change in treatment effect over time, thereby providing further information to aid induction. For example, as a higher proportion of patients/ clinicians adopt a new treatment, so intervention effect can be examined. Of course, the problem is not totally solved, because the possibility of different effects in other health systems (not included in the database) still exists.

Selection Once in a Study (Internal Validity)
However, the machine cannot do much about selection to intervention vs. control conditions (beyond, perhaps, enabling more confounding variables to be taken into account). This is because it cannot get around the cause-effect problem that randomisation neatly solves by ensuring that unknown variables are distributed at random (leaving only lack of precision to worry about). Thus, machine learning might create the impression that a new intervention is beneficial when it is not. If the new intervention has nasty side-effects or high costs, then many patients could end up getting treatment that does more harm than good, or which fails to maximise value for money. Stability of results across strata does not vitiate the concern.

It could be argued, however, that selection effects are likely to attenuate as the intervention is rolled out over an increasing proportion of the population. Let us try a thought experiment. Consider the finding that accident victims who receive a transfusion have worse outcomes than those who do not, even after risk-adjustment. Is this because transfusion is harmful, or because clinicians can spot those who need transfusion, net of variables captured in statistical models? Let us now suppose that, in response to the findings, clinicians subsequently reduce use of transfusion. It is then possible that changes in the control rate and in the treatment effect can provide evidence for or against cause and effect explanations. The problem here is that bias may change as the proportions receiving one treatment or the other changes. There are thus two possible explanations for any set of results – a change in bias or a change in effectiveness, as a wider range of patients/ clinicians receive the experimental intervention. It is difficult to come up with a convincing way to resolve the cause and effect problem. I must leave it to someone cleverer than myself to devise a theorem that might shed at least some light on the plausibility of the competing explanations – bias vs. cause and effect. But I am pessimistic for this general reason. As a treatment is rolled out (because it seems effective) or withdrawn (because it seems ineffective or harmful), so the beneficial or harmful effect (even in relative risk ratio terms) is likely to attenuate. But the bias is also likely to attenuate because less selection is taking place. Thus the two competing explanations may be confounded.

There is also the question of whether database studies can mitigate ‘survivor bias’. When the process (of machine learning) starts, then survivor bias may exist. But, by tracking estimated treatment effect over time, the machine can recognise all subsequent ‘eligible’ cases as they arise. This means that the problem of survivor bias should be progressively mitigated over time?

So what do I recommend? Three suggestions:

  1. Use machine learning to provide a clue to things that you might not have suspected or thought of as high priority for a trial.
  2. Nest RCTs within database studies, so that cause and effect can be established at least under specified circumstances, and then compare the results with what you would have concluded by machine learning alone.
  3. Use machine learning on an open-ended basis with no fixed stopping point or stopping rule, and make data available regularly to mitigate the risk of over-interpreting a random high. This approach is very different to the standard ‘trial’ with a fixed starting and end data, data-monitoring committees,[4] ‘data-lock’, and all manner of highly standardised procedures. Likewise, it is different to resource heavy statistical analysis, which must be done sparingly. Perhaps that is the real point – machine learning is inexpensive (has low marginal costs) once an ongoing database has been established, and so we can take a ‘working approach’, rather than a ‘fixed time point’ approach to analysis.

— Richard Lilford, CLAHRC WM Director


    1. Lilford RJ. The End of the Hegemony of Randomised Trials. 30 Nov 2012. [Online].
    2. Mytton J, Evison F, Chilton PJ, Lilford RJ. Removal of all ovarian tissue versus conserving ovarian tissue at time of hysterectomy in premenopausal patients with benign disease: study using routine data and data linkageBMJ. 2017; 356: j372.
    3. De Bono M, Fawdry RDS, Lilford RJ. Size of trials for evaluation of antenatal tests of fetal wellbeing in high risk pregnancy. J Perinat Med. 1990; 18(2): 77-87.
    4. Lilford RJ, Braunholtz D, Edwards S, Stevens A. Monitoring clinical trials—interim data should be publicly available. BMJ. 2001; 323: 441


The Rush Towards a Paperless Health Service: Stop the Music

I have written repeatedly on the harms that bringing IT into the consultation room can bring. I carried out a trial of computerised data entry during consultations showing that it undermined the clinician/patient relationship – this was over three decades ago.[1] This finding has been replicated many times since, with vivid accounts in Bob Wachter’s outstanding book.[2] It turns out that having to use IT during consultations is one of the main causes of ‘burn-out’ among doctors in the USA. A recent NIHR Programme study, in which CLAHRC WM collaborated, showed that IT is likely undermining patient safety in some aspects of practice (diagnosis; personalised care), even as it improves it in others (prescribing error).[3] Meanwhile, Fawdry has repeatedly argued [4] that the problems in integrating computers across institutions arise not because the IT systems themselves are ‘incompatible’ or because we do not have common ‘ontologies’, but because the underlying medical logic does not synchronise when you slap two systems together. Data in records can be shared (lab results, x-rays, etc.), but that is different to sharing records. So, do not force the pace – let a paperless system evolve. Apply the CLAHRC WM test; never implement an IT system module until you have examined how it affects the pattern of clinician/patient interactions in real world settings. We are sleep-walking into a digital future and completely ignoring the cautionary evidence that is becoming stronger by the year. Remember, nothing in health care – and I really do mean nothing – is as important as the relationships between clinician and patient.

— Richard Lilford, CLAHRC WM Director


  1. Brownbridge G, Lilford RJ, Tindale-Biscoe S. Use of a computer to take booking histories in a hospital antenatal clinic. Acceptability to midwives and patients and effects on the midwife-patient interaction. Med Care. 1988; 26(5): 474-87.
  2. Wachter R. The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age. New York, NY: McGraw-Hill Education. 2015.
  3. Lilford RJ. Introducing Hospital IT Systems – Two Cautionary Tales. NIHR CLAHRC West Midlands News Blog. 4 August 2017.
  4. Fawdry R. Paperless records are not in the best interest of every patient. BMJ. 2013; 346: f2064.

Introducing Hospital IT systems – Two Cautionary Tales

The beneficial effects of mature IT systems, such as at the Brigham and Women’s Hospital,[1] Intermountain Health Care,[2] and University Hospitals Birmingham NHS Foundation Trust,[3] have been well documented. But what happens when a commercial system is popped into a busy NHS general hospital? Lots of problems according to two detailed qualitative studies from Edinburgh.[4] [5] Cresswell and colleagues document problems with both stand-alone ePrescribing systems and with multi-modular systems.[4] The former drive staff crazy with multiple log-ins and duplicate data entry. Nor does their frustration lessen with time. Neither system types (stand-alone or multi-modular) presented a comprehensive overview of the patient record. This has obvious implications for patient safety. How is a doctor expected to detect a pattern in the data if they are not presented in a coherent format? In their second paper the authors examine how staff cope with the above problems.[5] To enable them to complete their tasks ‘workarounds’ were deployed. These workarounds frequently involved recourse to paper intermediaries. Staff often became overloaded with work and often did not have the necessary clinical information at their fingertips. Some workarounds were sanctioned by the organisation, others not. What do I make of these disturbing, but thorough, pieces of research? I would say four things:

  1. Move slowly and carefully when introducing IT and never, never go for heroic ‘big bang’ solutions.
  2. Employ lots of IT specialists who can adapt systems to people – do not try to go the other way round and eschew ‘business process engineering’, the risks of which are too high – be incremental.
  3. If you do not put the doctors in charge, make sure that they feel as if they are. More seriously – take your people with you.
  4. Forget integrating primary and secondary care, and social care and community nurses, and meals on wheels and whatever else. Leave that hubristic task to your hapless successor and introduce a patient held booklet made of paper – that’s WISDAM.[6]

— Richard Lilford, CLAHRC WM Director


  1. Weissman JS, Vogeli C, Fischer M, Ferris T, Kaushal R, Blumenthal B. E-prescribing Impact on Patient Safety, Use and Cost. Rockville, MD: Agency for Healthcare Research and Quality. 2007.
  2. Bohmer RMJ, Edmondson AC, Feldman L. Intermountain Health Care. Harvard Business School Case 603-066. 2002
  3. Coleman JJ, Hodson J, Brooks HL, Rosser D. Missed medication doses in hospitalised patients: a descriptive account of quality improvement measures and time series analysis. Int J Qual Health Care. 2013; 25(5): 564-72.
  4. Cresswell KM, Mozaffar H, Lee L, Williams R, Sheikh A. Safety risks associated with the lack of integration and interfacing of hospital health information technologies: a qualitative study of hospital electronic prescribing systems in England. BMJ Qual Saf. 2017; 26: 530-41.
  5. Cresswell KM, Mozaffar H, Lee L, Williams R, Sheikh A. W. Workarounds to hospital electronic prescribing systems: a qualitative study in English hospitals. BMJ Qual Saf. 2017; 26: 542-51.
  6. Lilford RJ. The WISDAM* of Rupert Fawdry. NIHR CLAHRC West Midlands News Blog. 5 September 2014.

The Second Machine Age

I must thank Dr Sebastiaan Mastenbroek (AMC, Amsterdam) for giving me a copy of the Second Machine Age by Brynjolfsson and McAfee.[1] At first I thought it was just another of those books describing how computers were going to take over the world.[2] Indeed the first part of the book is repetitive and not particularly insightful when it comes to the marvels of modern computers – I recently debated this subject live with another auteur, Daniel Susskind, on BBC World Service. However, the economic consequences of the second machine age are much more adroitly handled. The authors make a case that the wide disparities in wealth that have arisen over the last few decades are not entirely a function of globalization. The coming of computers has also had a large effect by increasing demand for jobs with a high cognitive content while reducing demand at the other end of the intellectual scale. Fortunately the book does not fall into the Luddite error of trying to hold back the progress of technology. That would be like the ancient Ottoman Empire which tried to ban printing. No, progress must continue, but it must be managed. The authors consider a universal income, but argue that it is too early for this. I agree. They also argue for a negative income tax. Such a tax does not act as a disincentive to work and has a lot going for it. All in all, this is one of the more sure-footed accounts of the economic consequences of the second machine age.

— Richard Lilford, CLAHRC WM Director


  1. Brynjolfsson E & McAfee A. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. New York, NY: W. W. Norton & Company; 2014.
  2. Lilford RJ. A Book for a Change. NIHR CLAHRC West Midlands News Blog. 29 January 2016.

Computer Interpretation of Foetal Heart Rates Does Not Help Distinguishing Babies That Need a Caesarean from Those That Do Not

In an earlier life I was involved in obtaining treatment costs for a pilot trial of computerised foetal heart monitoring versus standard foetal heart monitoring (CTG). The full trial, funded by NIHR, has now been published in the Lancet,[1] featuring Sara Kenyon from our CLAHRC WM theme 1. With over 46,000 participants the trial found no difference in a composite measure of foetal outcome or intervention rates. Perinatal mortality was only 3 per 10,000 women across both arms and the incidence of hypoxic encephalopathy was less than 1 per 1,000. Of course, the possibility of an educational effect from the computer decision support (‘contamination’) may have reduced the observed effect, but this could only be tested by a cluster trial. However, such a design would create its own set of problems, such as loss of precision and bias through interaction between method used and baseline risk across interventions and control sites. Also, the control group was not care as usual, but the visual display IT system shorn of its decision support (artificial intelligence) module.[2] Some support for the idea that control condition affected care in a positive direction, making any marginal effect of decision support hard to detect, comes from the low event rate across both study arms. Meanwhile, the lower than expected baseline event rates mean that any improvement in outcome will be hard to detect in future studies. So here is another topic that, like vitamin D given routinely to elderly people,[3] now sits below the “horizon of science” – the combination of low event rates and low plausible effect sizes mean that we can move on from this subject – at least in a high-income context. If you want to use the computerised method, and its costs are immaterial, then there is no reason not to; economics aside there appear to be no trade-offs here, since both benefits and harms were null.

— Richard Lilford, CLAHRC WM Director


  1. The INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet. 2017.
  2. Keith R. The INFANT study – a flawed design foreseen. Lancet. 2017.
  3. Lilford RJ. Effects of Vitamin D Supplements. NIHR CLAHRC West Midlands News Blog. 24 March 2017.

Digital Future of Systematic Reviews

A good friend and colleague, Kaveh Shojania, recently shared an article about bitcoin (a form of digital currency), which predicts the end of the finance industry as we know it.[1] The article argues that commercial banks, in particular, will no longer be needed. But what about our own industry of clinical epidemiology? Two thoughts occur:

  1. The current endeavour might not be sustainable.
  2. There might be another way to study prognosis, diagnosis and treatment.

We have argued in a previous post that traditional systematic reviews might soon become a victim to their own success. News blog readers will remember that we have argued that the size of the literature will soon become just too large to review in the normal way. In addition to which we have posited the twin issues of “question inflation and effect size deflation”. That is to say the number of potential comparisons is already becoming unwieldy (some network meta-analyses include over 100 individual comparators [2]), and plausible effect sizes are getting smaller as the headroom for further improvements gets used up. Our colleague Norman Waugh tells us that his latest Cochrane review concerning glucagon-like peptides in diabetes runs to over 800 pages. Many have written about the role of automation to search and screen the relevant literature,[3-5] including ourselves in a previous post, but the task of analysing the shedload of retrieved articles will itself become almost insurmountable. At the rate things are going, this may happen sooner than you think![6]

What is to be done? One possibility is that the whole of clinical epidemiology will be largely automated. We have written before about electronic patient records as a potential source of data for clinical research. This ‘rich’ data will be available for analysis by standard statistical methods. However, machine learning is being taken increasingly seriously, and so it is possible to imagine a world in which the bulk of clinical epidemiological studies are largely automated under programme control. That is to say, machine learning algorithms will sit behind rapidly accumulating clinical databases, searching for signals and conducting replication studies autonomously, perhaps even across national borders. In previous posts we have waxed lukewarm about IT systems, which have the potential to disrupt doctor-patient relationships, and where greater precision may be achieved at the cost of increasing inaccuracy. However, it is also possible that these problems can be mitigated by collecting and adjusting for ever larger amounts of information, and perhaps by finding instrumental variables, including those afforded by Mendelian randomisation.

Will all this mean that the CLAHRC WM director will soon retire, while his young colleagues find themselves being made redundant? Almost certainly not. For as long as can be envisaged, human agency will be required to write and monitor computer algorithms, to apply judgement to the outputs, to work out what it all means, and to design and implement subsidiary studies. If anything, epidemiologists of the future will require deeper epistemological understanding, statistical ability and technical knowhow.

— Richard Lilford, CLAHRC WM Director
— Yen-Fu Chen, Senior Research Fellow


  1. Lanchester J. When bitcoin grows up. London Rev Books. 2016; 38(8): 3-12.
  2. Zintzaras E, Doxani C, Mprotsis T, Schmid CH, Hadjigeorgiou GM. Network analysis of randomized controlled trials in multiple sclerosis. Clin Ther. 2012; 34(4): 857-69.
  3. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015; 4: 5.
  4. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014; 3: 74.
  5. Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Internet Res. 2014; 16(10): e223.
  6. Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9): e1000326.

Going Digital – the Electronic Patient Record

Everyone wants to go digital; it’s good, it’s modern, we must all be paperless. Welcome then the Electronic Patient Record. Great moves are underway to help hospitals go paperless in England, the USA and elsewhere.

Well, if you think it’s such a great idea read the recent Lancet paper by Martin and Sinsky.[1] They provide a thoughtful and well-referenced account of the shortcomings of electronic records in hospital care. You will find it hard to think that clinical care is improved by such systems once you have read the article. On the contrary, the evidence points the other way – these things actually impede good quality clinical care. One (perhaps the) reason is that they have become subverted. Instead of providing an information system for clinical care in real time, they have been heavily adapted to serve another master – the quality control industry.

The problem arises when clinical records (patient’s history, physical exam and progress) are digitised along with the easy stuff (electronic prescribing, laboratory results, scheduling) to create the all-singing, all-dancing electronic health record. There is a big difference between isolated systems performing particular tasks, such as digitising x-ray images, and going whole scale paperless. The critical point here concerns the ‘cognitive space’ where clinicians can show how their thought processes unfold. I guess that having the notes built around medical reasoning, rather than the tick box appetite of quality control procedures, helps in two ways. First, recording thoughts assists cognitive processes, as in writing down a list of differential diagnoses. Second, it helps others latch onto the story so far. These needs are brought out beautifully in the article, which chronicles the near unmitigated disaster that current electronic notes have become. The authors cite studies documenting the harm that modern electronic records do, and back these up with powerful anecdotes.

We health care professionals promote evidence-based decision-making, yet we are allowing ourselves to be sleepwalked into a poorly evaluated but massive intervention. Such evidence as the article can reference suggests that, far from assisting good care, electronic records (in their current form anyway) are inimical to it. We do enormous and expensive trials to find out whether we can extend life by a few months in an uncommon disease, but we let this potential monster intrude in a near evaluation vacuum. Maybe the question is how, rather than whether, electronic records should be used. In that case it seems clear that our target should be to find out how, since we clearly do not know how.

We need much more development and evaluation work on the design of electronic notes, configuration of services and the interaction between them. A way must be found to resolve the tension between all the other (‘secondary’) functions the notes perform and the real-time clinical care functions that current electronic systems have been shown to subvert. The suggestions made in the article are all extremely sensible. They privilege the clinical, and that is convivial to the clinical heart that beats inside my breast. But the constituencies who want to use notes for various organisational, quality control, and research purposes have not gone away. It seems that we cannot redesign the notes without at least considering these other putative needs. Our task is not complete if we just define what is needed for good clinical care in real time because social pressures to monitor health care providers in general, and doctors in particular, are not going away any time soon. There are three broad possibilities:

  1. Design the IT system so that it can both serve as a seamless record for real-time clinical care, and capture information for secondary purposes. This is unlikely to succeed given the evidence led in the article; a trade-off is all but inevitable between clinical prerogatives and wider organisational and social needs.
  2. Jettison the wider functions of audit and so on, and privilege the real time clinical care need. But we are not going to get away with this unless, at the very least, it can be shown that the ‘costs’ of collecting ancillary information exceeds its expected benefit.
  3. Change not just the structure of the electronic notes, but also work patterns and the personnel who enter different types of data. That is to say, it may be cost effective to re-engineer human resource and computer systems to separate, to a degree, entry and presentation of data for real time clinical care purposes and the data needed for ‘secondary’ purposes.

A great deal of research and development will be needed to achieve a near optimal system and the Industry would need to be incentivised to engage in such a process. That said, I suspect that much development and evaluation could be done off-line under simulation conditions. What is absolutely clear is that when coming to digitisation of health care records, we are embarking on one of the greatest socio-technical innovations ever undertaken. Information technology must interact with an extremely complex, subtle, and only partially understood healthcare environment. There is a clear role for CLAHRCs in this exercise, and our particular CLAHRC is collaborating with Prof Aziz Sheikh and colleagues in NIHR-sponsored work on introduction of IT systems in the NHS. In the meantime, people who implement IT systems should tread very gently – no place here for macho types who think they know it all. Careful, deliberate and patient R&D has produced, in the end, unimagined advances in medical care. Let the same sense of modesty guide our fledgling understanding of the information requirements of health care.

— Richard Lilford, CLAHRC WM Director


  1. Martin SA, & Sinsky CA. The map is not the territory: medical records and 21st century practice. Lancet. 2016; [ePub].

Systematic Reviewing in the Digital Era

In the field of systematic reviewing it is easy (and often necessary) to dip yourself deep into the sea of the literature and forget about all things that are going on in the outside world. Reflecting upon myself I realised that I hadn’t actually attended a proper Cochrane meeting even though I’ve been doing reviews for more than a decade. Before rendering myself truly obsolete, I decided to seize the opportunity when the Cochrane UK and Ireland Symposium came to Birmingham earlier in March to catch up with the latest development in the field. And I wasn’t disappointed.

A major challenge for people undertaking systematic reviews is to deal with the sheer number of potentially relevant papers against the timeline beyond which a review would be considered irrelevant. Indeed the issue is so prominent that we (colleagues in Warwick and Ottawa) have recently written and published a commentary to discuss ‘how to do a systematic review expeditiously’.[1] One of the most arduous processes in doing a systematic review is screening through the large number of records retrieved from search of bibliographical databases. Two years ago the bravest attempt that I heard of in a Campbell Collaboration Colloquium was sifting through over 40,000 records in a review. Two years on the number has gone up to over 70,000. While there is little sign that the number of published research papers is going to plateau in the future, I wonder how much reviewers’ stamina and patience can keep pace – even if they have the luxury of time to do it. Here comes the rescue of the clever computer. If Google’s AlphaGo can beat the human champion of Go games,[2] why cannot artificial intelligence saves reviewers from the humble but tedious task of screening articles?

Back to the symposium there is no shortage of signs of this digital revolution on the agenda. To begin with, the conference has no brochure or abstract book to pick up or print. All you get is a mobile phone app which tells you what the sessions are and where to go. Several plenary and workshop sessions were related to automation, which I was eager to attend and from which I learned of a growing literature on the use of automation throughout the review process,[3] including article sifting,[4] data extraction,[5] quality assessment [6] and report generation. Although most attempts were still exploratory, the use of text mining, classification algorithm and machine-learning to assist with citation screening appears to have matured sufficiently to be considered for practical application. The Abstrackr funded by AHRQ is an example that is currently freely available (registration required) and has been subject to independent evaluation.[7] Overall, existing studies suggest such software may potentially save reviewers’ workload in the range of 30-70% (by ruling out references unlikely to be relevant and hence don’t need to be screened) with a fairly high level of recall (missing 5% or less of eligible articles).[4] However this is likely to be subject-dependent and more empirical evidence will be required to demonstrate its practicality and limitations.

It is important to understand a bit more behind the “black box” machine when using such software, and so we were introduced to some online text mining and analysis tools during the workshop sessions. One example is “TerMine”, which allows you to put in some plain text or specify a text file or an URL. Within a few seconds or so it will return a list of text with most relevant terms highlighted (this can be viewed as a table ranked by relevance). I did a quick experimental analysis of the CLAHRC WM’s Director and Co-Director’s Blog, and the results seem to be a fair reflection of the themes: community health workers, public health, organisational failure, Cochrane reviews and service delivery were among the highest ranking terms (besides other frequent terms of CLAHRC WM and the Director’s name). The real challenge in using such tools, however, is how then to organise the identified terms in a sensible way (although there is other software around that is capable of doing things like semantic or cluster analysis), and perhaps more importantly, what important terms might be under-presented or absent.

Moving beyond systematic reviews, there are more ambitious developments such as the “Contentmine”, which is trying to “liberate 100 million facts from the scientific literature” using data mining techniques. Pending the support of more permissive copyright regulations and open access practice in scientific publishing, the software will be capable of automatically extracting data from virtually all available literature and then re-organise and present the contents (including texts and figures etc.) in a format specified by the users.

Finally, with all these exciting progresses around the world, Cochrane itself is certainly not lying idle. You might have seen its re-branded websites, but there are a lot more going on behind the scene: people who have used Review Manager (RevMan) can expect to see a “RevMan Web version” in the near future; the Cochrane Central Register of Controlled Trials (CENTRAL) is being enhanced by aforementioned automation techniques and will be complemented by a Cochrane Register of Study Data (CRS-D), which will make retrieval and use of data across reviews much easier (and thus facilitate further exploration of existing knowledge such as undertaking ‘multiple indication reviews’ advocated by the CLAHRC WM Director) [8]; there will also be a further enhanced Cochrane website with “PICO Annotator” and “PICOfinder” to help people locating relevant evidence more easily; and the Cochrane Colloquium will be replaced by an even larger conference which will bring together key players of systematic reviewing both within and beyond health care around the world. So watch the space!

— Yen-Fu Chen, Senior Research Fellow


  1. Tsertsvadze A, Chen Y-F, Moher D, Sutcliffe P, McCarthy N. How to conduct systematic reviews more expeditiously? Syst Rev. 2015; 4(1):1-6.
  2. Gibney E. What Google’s Winning Go Algorithm Will Do Next. Nature. 2016; 531: 284-5.
  3. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014; 3:74.
  4. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015; 4: 5.
  5. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015; 4: 78.
  6. Millard LA, Flach PA, Higgins JP. Machine learning to assist risk-of-bias assessments in systematic reviews. Int J Epidemiol. 2016; 45(1): 266-77.
  7. Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst Rev. 2015; 4: 80.
  8. Chen Y-F, Hemming K, Chilton PJ, Gupta KK, Altman DG, Lilford RJ. Scientific hypotheses can be tested by comparing the effects of one treatment over many diseases in a systematic review. J Clin Epidemiol. 2014; 67: 1309-19.