Tag Archives: Diagnosis

Machine Learning

The CLAHRC WM Director has mused about machine learning before.[1] Obermeyer and Emanuel discuss this topic in the hallowed pages of the New England Journal of Medicine.[2] They point out that machine learning is already replacing radiologists, and will soon encroach on pathology. They have used machine learning in their own work in predicting death in patients with metastatic cancer. They claim that machine learning will soon be used in diagnosis, but identify two of the reasons why this will take longer than for the other uses mentioned above. First, diagnosis does not present neat outcomes (dead or alive; malignant or benign). Second, the predictive variables are unstructured in terms of availability and where they are located in a record. A third problem, not mentioned by the authors, is that data may be collected because (and only because) the clinician has suspected the diagnosis. The playing field is then tilted in favour of the machine in any comparative study. One other problem the CLAHRC WM Director has with machine learning is that the neural network in silico goes head-to-head with a human in studies. In none of the work do the authors compare the accuracy of ‘machine learning’ against standard statistical methods, such as logistic regression.

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Digital Future of Systematic Reviews. NIHR CLAHRC West Midlands. 16 September 2016.
  2. Obermeyer Z & Emanuel EJ. Predicting the Future – Big Data, Machine Learning, and Clinical Future. New Engl J Med. 2016; 375(13): 1216-7.

Computer Beats Champion Player at Go – What Does This Mean for Medical Diagnosis?

A computer program has recently beaten one of the top players of the Chinese board game Go.[1] The reason that a computer’s success in Go is so important lies in the nature of Go. Draughts (or Checkers) can be solved completely by pre-specified algorithms. Similarly, chess can be solved by a pre-specified algorithm overlaid on a number of rules. But Go is different – while experienced players are better than novices, they cannot specify an algorithm for success that can be uploaded into a computer. This is because it is not possible to compute all possible combinations of moves in order to select the most propitious. This is for two reasons. First, there are too many possible combinations – much more than there are in chess. Second, experts cannot explicate the knowledge that makes them so. But the computer program can learn by accumulating experience. As it learns, it increases its ability to select moves that increase the probability of success – the neural network gradually recognises the most advantageous moves in response to the pattern of pieces on the board. So, in theory, a computer program could learn which patterns of symptoms, signs, and blood tests are most predictive of which diseases.

Why does the CLAHRC WM Director think this is a long way off? Well, it has nothing to do with the complexity of diagnosis, or intractability of the topic. No, it is a practical problem. For the computer program to become an expert Go player, it required access to hundreds of thousands of games, each with a clear win/lose outcome. In comparison, clinical diagnosis evolves over a long period in different places; the ‘diagnosis’ can be ephemeral (a person’s diagnosis may change as doctors struggle to pin it down); initial diagnosis is often wrong; and a person can have multiple diagnoses. Creating a self-learning program to make diagnoses is unlikely to succeed for the foreseeable future. The logistics of providing sufficient patterns of symptoms and signs over different time-scales, and the lack of clear outcomes, are serious barriers to success. However, a program to suggest possible diagnoses on the basis of current codifiable knowledge is a different matter altogether. It could be built using current rules, e.g. to consider malaria in someone returning from Africa, or giant-cell arteritis in an elderly person with sudden loss of vision.

— Richard Lilford, CLAHRC WM Director


  1. BBC News. Artificial intelligence: Google’s AlphaGo beats Go master Lee Se-dol. 12 March 2016.

An Article We All Had Better Read

Oh dear – the CLAHRC WM Director would so like to think that disease-specific mortality is the appropriate outcome for cancer screening trials, rather than all-cause mortality. But Black and colleagues have published a very sobering article.[1] They found 12 trials of cancer screening (yes, only 12) where both cancer-specific mortality and all-cause mortality are reported. The effect size (in relative risk terms) is bigger for cancer-specific than for all-cause mortality in seven trials, about the same in four, and the other way in one. This suggests that the benefit is greater, even relatively, for cancer-specific than for all deaths. There are two explanations for this – one that the CLAHRC WM Director had thought of, and the other that was new to him.

  1. Investigation and treatment of false positives (including cancers that would never had presented) may increase risk of death as a result of iatrogenesis and heightened anxiety. There is some evidence for this.
  2. According to the ‘sticky diagnosis theory’, once a diagnostic label has been assigned, then a subsequent death is systematically more likely to be attributed to that diagnosis than if that diagnosis had not been made. There is some evidence for this hypothesis too.

And here is the thing – in screening trials a very small proportion of people in either arm of the study die from the index disease. The corollary is that a small mortality increase among the majority not destined to die has a relatively large effect.

So we have done many expensive trials, and implemented large, expensive screening programmes, yet our effects might have been nugatory. And there is a reason why so few trials have all-cause mortality outcomes – the trials have to be long and potential effects on this end-point are small and liable to be lost in the noise. Somewhere there is a ‘horizon of science’ where precision is hard to find, and where tiny biases can swamp treatment effects. At the risk of sounding nihilistic, the CLAHRC WM Director wonders whether cancer screening is such a topic.

— Richard Lilford, CLAHRC WM Director


  1. Black WC, Haggstrom DA, Welch HG. All-Cause Mortality in Randomized Trials of Cancer Screening. J Nat Cancer Instit. 2002; 94(3): 167-73.

Diagnostic Errors – Extremely Important but How Can They be Measured?

Previous posts have emphasised the importance of diagnostic error.[1] [2] Prescient perhaps, since the influential Institute of Medicine has published a report into diagnostic error in association with the National Academies of Sciences, Engineering, and Medicine.[3] Subsequently, McGlynn et al. highlight the importance of measurement of diagnostic errors.[4] There is no single, encompassing method of measurement, while the three most propitious methods (autopsy reports, malpractice cases, and record review) all have strengths and weaknesses. One of the particular issues with record review, not brought out, is that one of the most promising interventions that could tackle the issue, computerised decision support, is likely to also affect the accuracy of the measurement of diagnostic errors. So we are left with a big problem that is hard to quantify in a way that is unbiased with respect to the most promising remedy. Either we have to measure IT based interventions using simulations (arguably not generalizable to real practice), or changes in rates among post-mortems or malpractice claims (arguably insensitive). There is yet another idea – to design computer support systems so that the doctor must give his provisional diagnosis before the decision support is activated, and then see how often the clinician alters behaviours in a way that can be traced back to any additional diagnosis suggested by the computer?

— Richard Lilford, CLAHRC WM Director


  1. Lilford RJ. Bad Apples vs. Bad Systems. NIHR CLAHRC West Midlands News Blog. 20 February 2015.
  2. Lilford RJ. Bring Back the Ward Round. NIHR CLAHRC West Midlands News Blog. 20 March 2015.
  3. Balogh EP, Miller BT, Ball JR (eds.). Improving Diagnosis in Health Care. National Academies of Sciences, Engineering, and Medicine. 2015.
  4. McGlynn EA, McDonald KM, Cassel CK. Measurement Is Essential for Improving Diagnosis and Reducing Diagnostic Error. JAMA. 2015; 314(23): 2501-2.

Bring Back the Ward Round

Diagnosis, diagnosis, diagnosis. Both this and a previous post have made the argument that diagnostic errors should receive more attention. An important and elegant paper from previous CLAHRC WM collaborator, Wolfgang Gaissmaier,[1] shows that diagnostic accuracy is improved when medical students work in pairs. Of course, paired working is not possible most of the time, but it does suggest that opportunities for doctors to ‘put their heads together’ should be created whenever possible. The old-fashioned ward round had much to commend it.

— Richard Lilford, CLAHRC WM Director


  1. Hautz WE, Kämmer JE, Schauber SK, Spies CD, Gaissmaier W. Diagnostic Performance by Medical Students Working Individually or in Teams. JAMA. 2015; 313(3): 303-4.