Tag Archives: Quality

Measuring Quality of Care

Measuring quality of care is not a straightforward business:

  1. Routinely collected outcome data tend to be misleading because of very poor ratios of signal to noise.[1]
  2. Clinical process (criterion based) measures require case note review and miss important errors of omission, such as diagnostic errors.
  3. Adverse events also require case note review and are prone to measurement error.[2]

Adverse event review is widely practiced, usually involving a two-stage process:

  1. A screening process (sometimes to look for warning features [triggers]).
  2. A definitive phase to drill down in more detail and refute or confirm (and classify) the event.

A recent HS&DR report [3] is important for two particular reasons:

  1. It shows that a one-stage process is as sensitive as the two-stage process. So triggers are not needed; just as many adverse events can be identified if notes are sampled at random.
  2. In contrast to (other) triggers, deaths really are associated with a high rate of adverse events (apart, of course, from the death itself). In fact not only are adverse events more common among patients who have died than among patients sampled at random (nearly 30% vs. 10%), but the preventability rates (probability that a detected adverse event was preventable) also appeared slightly higher (about 60% vs. 50%).

This paper has clear implications for policy and practice, because if we want a population ‘enriched’ for high adverse event rates (on the ‘canary in the mineshaft’ principle), then deaths provide that enrichment. The widely used trigger tool, however, serves no useful purpose – it does not identify a higher than average risk population, and it is more resource intensive. It should be consigned to history.

Lastly, England and Wales have mandated a process of death review, and the adverse event rate among such cases is clearly of interest. A word of caution is in order here. The reliability (inter-observer agreement) in this study was quite high (Kappa 0.5), but not high enough for comparisons across institutions to be valid. If cross-institutional comparisons are required, then:

  1. A set of reviewers must review case notes across hospitals.
  2. At least three reviewers should examine each case note.
  3. Adjustment must be made for reviewer effects, as well as prognostic factors.

The statistical basis for these requirements are laid out in detail elsewhere.[4] It is clear that reviewers should not review notes from their own hospitals, if any kind of comparison across institutions is required – the results will reflect the reviewers rather than the hospitals.

Richard Lilford, CLAHRC WM Director


  1. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling studyBMJ Qual Saf. 2012; 21(12): 1052-6.
  2. Lilford R, Mohammed M, Braunholtz D, Hofer T. The measurement of active errors: methodological issues. Qual Saf Health Care. 2003; 12(s2): ii8-12.
  3. Mayor S, Baines E, Vincent C, et al. Measuring harm and informing quality improvement in the Welsh NHS: the longitudinal Welsh national adverse events study. Health Serv Deliv Res. 2017; 5(9).
  4. Manaseki-Holland S, Lilford RJ, Bishop JR, Girling AJ, Chen YF, Chilton PJ, Hofer TP; UK Case Note Review Group. Reviewing deaths in British and US hospitals: a study of two scales for assessing preventability. BMJ Qual Saf. 2016. [ePub].

An Interesting Report of Quality of Care Enhancement Strategies Across England, Germany, Sweden, the Netherlands, and the USA

An interesting paper from the Berlin University of Technology compares the quality enhancement systems across the above countries with respect to measuring, reporting and rewarding quality.[1] This paper is an excellent resource for policy and health service researchers. The US has the most developed system of quality-related payments (P4P) of the five countries. England wisely uses only process measures to reward performance, while the US and Germany include patient outcomes. The latter are unfair because of signal to noise issues,[2] and the risk-adjustment fallacy.[3] [4] Above all, remember Lilford’s axiom – never base rewards or sanctions on a measurement over which service providers do not feel they have control.[5] It is true, as the paper argues, that rates of adherence to a single process seldom correlate with outcome. But this is a signal to noise problem. ‘Proving’ that processes are valid takes huge RCTs, even when the process is applied to 0% (control arm) vs. approaching 100% (intervention arm) of patients. So how could an improvement from say 40% to 60% in adherence to clinical process show up in routinely collected data?[6] I have to keep on saying it – collect outcome data, but in rewarding or penalising institutions on the basis of comparative performance – process, process, process.

— Richard Lilford, CLAHRC WM Director


  1. Pross C, Geissler A, Busse R. Measuring, Reporting, and Rewarding Quality of Care in 5 Nations: 5 Policy Levers to Enhance Hospital Quality Accountability. Milbank Quart. 2017; 95(1): 136-83.
  2. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Qual Saf. 2012; 21: 1052-6.
  3. Mohammed MA, Deeks JJ, Girling A, et al. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals. BMJ. 2009; 338: b780.
  4. Lilford R, & Pronovost P. Using hospital mortality rates to judge hospital performance: a bad idea that just won’t go away. BMJ. 2010; 340: c2016.
  5. Lilford RJ. Important evidence on pay for performance. NIHR CLAHRC West Midlands News Blog. 20 November 2015.
  6. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

A Proper Large-Scale Quality Improvement Study in a Middle-Income Country

The vast majority of studies testing an intervention to improve quality/safety of care are conducted in high-income countries. However, a cluster RCT of 118 Brazilian ICUs (6,761 patients) has recently been reported.[1] The intervention was compound (multi-component), involving goal setting, clinician prompting, and multi-disciplinary ward rounds. Although mortality and other patient outcomes were not improved, clinical processes (e.g. use of appropriate settings on the ventilator and avoidance of heavy sedation) did improve. The nub of my argument is that clinical outcomes are insensitive indicators of improved practice, and we should be content with showing improved adherence to proven care standards – the argument is laid out numerically elsewhere.[2] The safety and quality movement is doomed so long as we insist on showing improvements in patient level outcomes.

— Richard Lilford, CLAHRC WM Director


  1. Writing Group for the CHECKLIST-ICU Investigators and the Brazilian Research in Intensive Care Network (BRICNet). Effect of a Quality Improvement Intervention with Daily Round Checklists, Goal Setting, and Clinician Prompting on Mortality of Critically Ill Patients. JAMA. 2016;315(14):1480-90.
  2. Lamont T, Barber N, de Pury J, et al. New approaches to evaluating complex health and care systems. BMJ. 2016; 352: i154.

League tables – not always bad

Health services have become used to report cards on performance. These are valid if the signal to noise ratio is favourable (waiting times, vaccination rates, patient satisfaction), but invalid when the signal is overwhelmed by noise (standardised mortality ratios, readmission rates).[1] [2] School performance on national tests seems to have a reasonably good signal to noise ratio, especially if adjusted. So what is the effect of league tables on provider performance of schools and how does that differ between:

  1. Public vs private providers.
  2. Schools with good vs bad to relative baseline performance.

Andrabi et al. [3] carried out an RCT of 112 Pakistani villages, all of which contained a mix of fee paying and public schools. Test score results improved after the introduction of report cards showing consolidated results across each school. Parents were aware of the reports and took them seriously.  School performance increased across the board, costs of private schools fell on average (and the worst performing closed down) and equity improved since schools at baseline improved most.

Private schools in the above study may have responded to financial incentives since their pupils could vote with their feet. But the interesting thing is that public providers also improved.  This might reflect the pure power of comparative information. Alternatively, providers may have responded to the implicit threat to livelihoods from the availability of comparative data in the context of rapidly increasing provision of private education.

Another example of the availability of comparative data improving quality can be found in the UK catering industry, following the introduction of a scheme requiring restaurants and cafes to make their food hygiene inspection ratings publically available, and visible on doors and shop fronts. Customers preferred to buy food from outlets with higher ratings, and competition among food businesses on hygiene standards resulted in an increase in the proportion of food premises that complied with hygiene standards.[4]

What implications does this have for healthcare? Hospital performance seems to improve under the influence of league tables, even when reimbursement is not affected by the results. However, Fotaki says that the impact of initiatives such as NHS Choices, designed to provide public access to comparative data on hospital performance, consultant outcomes, and user satisfaction, may have little effect. She argues that the impact of informed choice on efficiency and quality is limited at best, and may even have negative consequences for equity: pre-existing inequalities of income and education influence patients’ access to information and ability to choose.[5] We would welcome comments on this enigma.

— Richard Lilford, CLAHRC WM Director


  1. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet. 2004;363(9415):1147-54.
  2. Girling AJ, Hofer TP, Wu J, Chilton PJ, Nicholl JP, Mohammed MA, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Quality & Safety. 2012;21:1052-6
  3. Andrabi T, Das J, Khawaja AI. Report cards: the impact of providing school and child test scores on educational markets. Social Science Research Network. RWP14-052. June 2014.
  4. Salis S, Jabin N, Morris S. Evaluation of the impact of the Food Hygiene Rating Scheme and the Food Hygiene Information Scheme on food hygiene standards and food-borne illnesses. Food Standards Agency. March 2015.
  5. Fotaki M. What market-based patient choice can’t do for the NHS: The theory and evidence of how choice works in health care. 2014.