Tag Archives: Quality

Measuring the Quality of Health Care in Low-Income Settings

Measuring the quality of health care in High-Income Countries (HIC) is deceptively difficult, as shown by work carried out by many research groups, including CLAHRC WM.[1-5] However, a large amount of information is collected routinely by health care facilities in HICs. This data includes outcome data, such as Standardised Mortality Rates (SMRs), death rates from ’causes amenable to health care’, readmission rates, morbidity rates (such as pressure damage), and patient satisfaction, along with process data, such as waiting times, prescribing errors, and antibiotic use. There is controversy over many of these endpoints, and some are much better barometers of safety than others. While incident reporting systems provide a very poor basis for epidemiological studies (that is not their purpose), case-note review provides arguably the best and most widely used method for formal study of care quality – at least in hospitals.[3] [6] [7] Measuring safety in primary care is inhibited by the less comprehensive case-notes found in primary care settings as compared to hospital case-notes. Nevertheless, increasing amounts of process information is now available from general practices, particularly in countries (such as the UK) that collect this information routinely in electronic systems. It is possible, for example, to measure rates of statin prescriptions for people with high cardiovascular risk, and anticoagulants for people with ventricular fibrillation, as our CLAHRC has shown.[8] [9] HICs also conduct frequent audits of specific aspects of care – essentially by asking clinicians to fill in detailed pro formas for patients in various categories. For instance, National Audits in the UK have been carried out into all patients experiencing a myocardial infarction.[10] Direct observation of care has been used most often to understand barriers and facilitators to good practice, rather than to measure quality / safety in a quantitative way. However, routine data collection systems provide a measure of patient satisfaction with care – in the UK people who were admitted to hospital are surveyed on a regular basis [11] and general practices are required to arrange for anonymous patient feedback every year.[12] Mystery shoppers (simulated patients) have also been used from time to time, albeit not as a comparative epidemiological tool.[13]

This picture is very different in Low- and Middle-Income Countries (LMIC) and, again, it is yet more difficult to assess quality of out of hospital care than of hospital care.[14] Even in hospitals routine mortality data may not be available, let alone process data. An exception is the network of paediatric centres established in Kenya by Prof Michael English.[15] Occasionally large scale bespoke studies are carried out in LMICs – for example, a recent study in which CLAHRC WM participated, measured 30 day post-operative mortality rates in over 60 hospitals across low-, middle- and high-income countries.[16]

The quality and outcomes of care in community settings in LMICs is a woefully understudied area. We are attempting to correct this ‘dearth’ of information in a study in nine slums spread across four African and Asian countries. One of the largest obstacles to such a study is the very fragmented nature of health care provision in community settings in LMICs – a finding confirmed by a recent Lancet commission.[17] There are no routine data collection systems, and even deaths are not registered routinely. Where to start?

In this blog post I lay out a framework for measurement of quality from largely isolated providers, many of whom are unregulated, in a system where there is no routine system of data and no archive of case-notes. In such a constrained situation I can think of three (non-exclusive) types of study:

  1. Direct observation of the facilities where care is provided without actually observing care or its effects. Such observation is limited to some of the basic building blocks of a health care system – what services are present (e.g. number of pharmacies per 1,000 population) and availability (how often the pharmacy is open; how often a doctor / nurse / medical officer is available for consultation in a clinic). Such a ‘mapping’ exercise does not capture all care provided – e.g. it will miss hospital care and municipal / hospital-based outreach care, such as vaccination provided by Community Health Workers. It will also miss any IT based care using apps or online consultations.
  2. Direct observation of the care process by external observers. Researchers can observe care from close up, for example during consultations. Such observations can cover the humanity of care (which could be scored) and/or technical quality (which again could be scored against explicit standards and/or in a holistic (implicit) basis).[6] [7] An explicit standard would have to be based mainly on ‘if-then’ rules – e.g. if a patient complained of weight loss, excessive thirst, or recurrent boils, did the clinicians test their urine for sugar; if the patient complained of persistent productive cough and night sweats was a test for TB arranged? Implicit standards suffer from low reliability (high inter-observer variation).[18] Moreover, community providers in LMICs are arguably likely to be resistant to what they might perceive as an intrusive or even threatening form of observation. Those who permitted such scrutiny are unlikely to constitute a random sample. More vicarious observations – say of the length of consultations – would have some value, but might still be seen as intrusive. Provided some providers would permit direct observation, their results may represent an ‘upper bound’ on performance.
  3. Quality as assessed through the eyes of the patient / members of the public. Given the limitations of independent observation, the lack of anamnestic records of clinical encounters in the form of case-notes, absence of routine data, and likely limitations on access by independent direct observers, most information may need to be collected from patients themselves, or as we discuss, people masquerading as patients (simulated patients / mystery shoppers). The following types of data collection methods can be considered:
    1. Questions directed at members of the public regarding preventive services. So, households could be asked about vaccinations, surveillance (say for malnutrition), and their knowledge of screening services offered on a routine basis. This is likely to provide a fairly accurate measure of the quality of preventive services (provided the sampling strategy was carefully designed to yield a representative sample). This method could also provide information on advice and care provided through IT resources. This is a situation where some anamnestic data collection would be possible (with the permission of the respondent) since it would be possible to scroll back through the electronic ‘record’.
    2. Opinion surveys / debriefing following consultations. This method offers a viable alternative to observation of consultations and would be less expensive (though still not inexpensive). Information on the kindness / humanity of services could be easily obtained and quantified, along with ease of access to ambulatory and emergency care.[19] Measuring clinical quality would again rely on observations against a gold standard,[20] but given the large number of possible clinical scenarios standardising quality assessment would be tricky. However, a coarse-grained assessment would be possible and, given the low quality levels reported anecdotally, failure to achieve a high degree of standardisation might not vitiate collection of important information. Such a method might provide insights into the relative merits and demerits of traditional vs. modern health care, private vs. public, etc., provided that these differences were large.
    3. Simulated patients offering standardised clinical scenarios. This is arguably the optimal method of technical quality assessment in settings where case-notes are perfunctory or not available. Again, consultations could be scored for humanity of care and clinical/ technical competence, and again explicit and/or implicit standards could be used. However, we do not believe it would be ethical to use this method without obtaining assent from providers. There are some examples of successful use of the methods in LMICs.[21] [22] However, if my premise is accepted that providers must assent to use of simulated patients, then it is necessary to first establish trust between providers and academic teams, and this takes time. Again, there is a high probability that only the better providers will provide assent, in which case observations would likely represent ‘upper bounds’ on quality.

In conclusion, I think that the basic tools of quality assessment, in the current situation where direct observation and/or simulated patients are not acceptable, is a combination of:

  1. Direct observation of facilities that exist, along with ease of access to them, and
  2. Debriefing of people who have recently used the health facilities, or who might have received preventive services that are not based in these facilities.

We do not think that the above mentioned shortcomings of these methods is a reason to eschew assessment of service quality in community settings (such as slums) in LMICs – after all, one of the most powerful levers to improvement is quantitative evidence of current care quality.[23] [24] The perfect should not be the enemy of the good. Moreover, if the anecdotes I have heard regarding care quality (providers who hand out only three types of pill – red, yellow and blue; doctors and nurses who do not turn up for work; prescription of antibiotics for clearly non-infectious conditions) are even partly true, then these methods would be more than sufficient to document standards and compare them across types of provider and different settings.

— Richard Lilford, CLAHRC WM Director


  1. Brown C, Hofer T, Johal A, Thomson R, Nicholl J, Franklin BD, Lilford RJ. An epistemology of patient safety research: a framework for study design and interpretation. Part 1. Conceptualising and developing interventions. Qual Saf Health Care. 2008; 17(3): 158-62.
  2. Brown C, Hofer T, Johal A, Thomson R, Nicholl J, Franklin BD, Lilford RJ. An epistemology of patient safety research: a framework for study design and interpretation. Part 2. Study design. Qual Saf Health Care. 2008; 17(3): 163-9.
  3. Brown C, Hofer T, Johal A, Thomson R, Nicholl J, Franklin BD, Lilford RJ. An epistemology of patient safety research: a framework for study design and interpretation. Part 3. End points and measurement. Qual Saf Health Care. 2008; 17(3): 170-7.
  4. Brown C, Hofer T, Johal A, Thomson R, Nicholl J, Franklin BD, Lilford RJ. An epistemology of patient safety research: a framework for study design and interpretation. Part 4. One size does not fit all. Qual Saf Health Care. 2008; 17(3): 178-81.
  5. Brown C, Lilford R. Evaluating service delivery interventions to enhance patient safety. BMJ. 2008; 337: a2764.
  6. Benning A, Ghaleb M, Suokas A, Dixon-Woods M, Dawson J, Barber N, et al. Large scale organisational intervention to improve patient safety in four UK hospitals: mixed method evaluation. BMJ. 2011; 342: d195.
  7. Benning A, Dixon-Woods M, Nwulu U, Ghaleb M, Dawson J, Barber N, et al. Multiple component patient safety intervention in English hospitals: controlled evaluation of second phase. BMJ. 2011; 342: d199.
  8. Finnikin S, Ryan R, Marshall T. Cohort study investigating the relationship between cholesterol, cardiovascular risk score and the prescribing of statins in UK primary care: study protocol. BMJ Open. 2016; 6(11): e013120.
  9. Adderley N, Ryan R, Marshall T. The role of contraindications in prescribing anticoagulants to patients with atrial fibrillation: a cross-sectional analysis of primary care data in the UK. Br J Gen Pract. 2017. [ePub].
  10. Herrett E, Smeeth L, Walker L, Weston C, on behalf of the MINAP Academic Group. The Myocardial Ischaemia National Audit Project (MINAP). Heart. 2010; 96: 1264-7.
  11. Care Quality Commission. Adult inpatient survey 2016. Newcastle-upon-Tyne, UK: Care Quality Commission, 2017.
  12. Ipsos MORI. GP Patient Survey. National Report. July 2017 Publication. London: NHS England, 2017.
  13. Grant C, Nicholas R, Moore L, Sailsbury C. An observational study comparing quality of care in walk-in centres with general practice and NHS Direct using standardised patients. BMJ. 2002; 324: 1556.
  14. Nolte E & McKee M. Measuring and evaluating performance. In: Smith RD & Hanson K (eds). Health Systems in Low- and Middle-Income Countries: An economic and policy perspective. Oxford: Oxford University Press; 2011.
  15. Tuti T, Bitok M, Malla L, Paton C, Muinga N, Gathara D, et al. Improving documentation of clinical care within a clinical information network: an essential initial step in efforts to understand and improve care in Kenyan hospitals. BMJ Global Health. 2016; 1(1): e000028.
  16. Global Surg Collaborative. Mortality of emergency abdominal surgery in high-, middle- and low-income countries. Br J Surg. 2016; 103(8): 971-88.
  17. McPake B, Hanson K. Managing the public-private mix to achieve universal health coverage. Lancet. 2016; 388: 622-30.
  18. Lilford R, Edwards A, Girling A, Hofer T, Di Tanna GL, Petty J, Nicholl J. Inter-rater reliability of case-note audit: a systematic review. J Health Serv Res Policy. 2007; 12(3): 173-80.
  19. Schoen C, Osborn R, Huynh PT, Doty M, Davis K, Zapert K, Peugh J. Primary Care and Health System Performance: Adults’ Experiences in Five Countries. Health Aff. 2004.
  20. Kruk ME & Freedman LP. Assessing health system performance in developing countries: A review of the literature. Health Policy. 2008; 85: 263-76.
  21. Smith F. Private local pharmacies in low- and middle-income countries: a review of interventions to enhance their role in public health. Trop Med Int Health. 2009; 14(3): 362-72.
  22. Satyanarayana S, Kwan A, Daniels B, Subbaramn R, McDowell A, Bergkvist S, et al. Use of standardised patients to assess antibiotic dispensing for tuberculosis by pharmacies in urban India: a cross-sectional study. Lancet Infect Dis. 2016; 16(11): 1261-8.
  23. Kudzma E C. Florence Nightingale and healthcare reform. Nurs Sci Q. 2006; 19(1): 61-4.
  24. Donabedian A. The end results of health care: Ernest Codman’s contribution to quality assessment and beyond. Milbank Q. 1989; 67(2): 233-56.

Measuring Quality of Care

Measuring quality of care is not a straightforward business:

  1. Routinely collected outcome data tend to be misleading because of very poor ratios of signal to noise.[1]
  2. Clinical process (criterion based) measures require case note review and miss important errors of omission, such as diagnostic errors.
  3. Adverse events also require case note review and are prone to measurement error.[2]

Adverse event review is widely practiced, usually involving a two-stage process:

  1. A screening process (sometimes to look for warning features [triggers]).
  2. A definitive phase to drill down in more detail and refute or confirm (and classify) the event.

A recent HS&DR report [3] is important for two particular reasons:

  1. It shows that a one-stage process is as sensitive as the two-stage process. So triggers are not needed; just as many adverse events can be identified if notes are sampled at random.
  2. In contrast to (other) triggers, deaths really are associated with a high rate of adverse events (apart, of course, from the death itself). In fact not only are adverse events more common among patients who have died than among patients sampled at random (nearly 30% vs. 10%), but the preventability rates (probability that a detected adverse event was preventable) also appeared slightly higher (about 60% vs. 50%).

This paper has clear implications for policy and practice, because if we want a population ‘enriched’ for high adverse event rates (on the ‘canary in the mineshaft’ principle), then deaths provide that enrichment. The widely used trigger tool, however, serves no useful purpose – it does not identify a higher than average risk population, and it is more resource intensive. It should be consigned to history.

Lastly, England and Wales have mandated a process of death review, and the adverse event rate among such cases is clearly of interest. A word of caution is in order here. The reliability (inter-observer agreement) in this study was quite high (Kappa 0.5), but not high enough for comparisons across institutions to be valid. If cross-institutional comparisons are required, then:

  1. A set of reviewers must review case notes across hospitals.
  2. At least three reviewers should examine each case note.
  3. Adjustment must be made for reviewer effects, as well as prognostic factors.

The statistical basis for these requirements are laid out in detail elsewhere.[4] It is clear that reviewers should not review notes from their own hospitals, if any kind of comparison across institutions is required – the results will reflect the reviewers rather than the hospitals.

Richard Lilford, CLAHRC WM Director


  1. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling studyBMJ Qual Saf. 2012; 21(12): 1052-6.
  2. Lilford R, Mohammed M, Braunholtz D, Hofer T. The measurement of active errors: methodological issues. Qual Saf Health Care. 2003; 12(s2): ii8-12.
  3. Mayor S, Baines E, Vincent C, et al. Measuring harm and informing quality improvement in the Welsh NHS: the longitudinal Welsh national adverse events study. Health Serv Deliv Res. 2017; 5(9).
  4. Manaseki-Holland S, Lilford RJ, Bishop JR, Girling AJ, Chen YF, Chilton PJ, Hofer TP; UK Case Note Review Group. Reviewing deaths in British and US hospitals: a study of two scales for assessing preventability. BMJ Qual Saf. 2016. [ePub].

An Interesting Report of Quality of Care Enhancement Strategies Across England, Germany, Sweden, the Netherlands, and the USA

An interesting paper from the Berlin University of Technology compares the quality enhancement systems across the above countries with respect to measuring, reporting and rewarding quality.[1] This paper is an excellent resource for policy and health service researchers. The US has the most developed system of quality-related payments (P4P) of the five countries. England wisely uses only process measures to reward performance, while the US and Germany include patient outcomes. The latter are unfair because of signal to noise issues,[2] and the risk-adjustment fallacy.[3] [4] Above all, remember Lilford’s axiom – never base rewards or sanctions on a measurement over which service providers do not feel they have control.[5] It is true, as the paper argues, that rates of adherence to a single process seldom correlate with outcome. But this is a signal to noise problem. ‘Proving’ that processes are valid takes huge RCTs, even when the process is applied to 0% (control arm) vs. approaching 100% (intervention arm) of patients. So how could an improvement from say 40% to 60% in adherence to clinical process show up in routinely collected data?[6] I have to keep on saying it – collect outcome data, but in rewarding or penalising institutions on the basis of comparative performance – process, process, process.

— Richard Lilford, CLAHRC WM Director


  1. Pross C, Geissler A, Busse R. Measuring, Reporting, and Rewarding Quality of Care in 5 Nations: 5 Policy Levers to Enhance Hospital Quality Accountability. Milbank Quart. 2017; 95(1): 136-83.
  2. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Qual Saf. 2012; 21: 1052-6.
  3. Mohammed MA, Deeks JJ, Girling A, et al. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals. BMJ. 2009; 338: b780.
  4. Lilford R, & Pronovost P. Using hospital mortality rates to judge hospital performance: a bad idea that just won’t go away. BMJ. 2010; 340: c2016.
  5. Lilford RJ. Important evidence on pay for performance. NIHR CLAHRC West Midlands News Blog. 20 November 2015.
  6. Lilford RJ, Chilton PJ, Hemming K, Girling AJ, Taylor CA, Barach P. Evaluating policy and service interventions: framework to guide selection and interpretation of study end points. BMJ. 2010; 341: c4413.

A Proper Large-Scale Quality Improvement Study in a Middle-Income Country

The vast majority of studies testing an intervention to improve quality/safety of care are conducted in high-income countries. However, a cluster RCT of 118 Brazilian ICUs (6,761 patients) has recently been reported.[1] The intervention was compound (multi-component), involving goal setting, clinician prompting, and multi-disciplinary ward rounds. Although mortality and other patient outcomes were not improved, clinical processes (e.g. use of appropriate settings on the ventilator and avoidance of heavy sedation) did improve. The nub of my argument is that clinical outcomes are insensitive indicators of improved practice, and we should be content with showing improved adherence to proven care standards – the argument is laid out numerically elsewhere.[2] The safety and quality movement is doomed so long as we insist on showing improvements in patient level outcomes.

— Richard Lilford, CLAHRC WM Director


  1. Writing Group for the CHECKLIST-ICU Investigators and the Brazilian Research in Intensive Care Network (BRICNet). Effect of a Quality Improvement Intervention with Daily Round Checklists, Goal Setting, and Clinician Prompting on Mortality of Critically Ill Patients. JAMA. 2016;315(14):1480-90.
  2. Lamont T, Barber N, de Pury J, et al. New approaches to evaluating complex health and care systems. BMJ. 2016; 352: i154.

League tables – not always bad

Health services have become used to report cards on performance. These are valid if the signal to noise ratio is favourable (waiting times, vaccination rates, patient satisfaction), but invalid when the signal is overwhelmed by noise (standardised mortality ratios, readmission rates).[1] [2] School performance on national tests seems to have a reasonably good signal to noise ratio, especially if adjusted. So what is the effect of league tables on provider performance of schools and how does that differ between:

  1. Public vs private providers.
  2. Schools with good vs bad to relative baseline performance.

Andrabi et al. [3] carried out an RCT of 112 Pakistani villages, all of which contained a mix of fee paying and public schools. Test score results improved after the introduction of report cards showing consolidated results across each school. Parents were aware of the reports and took them seriously.  School performance increased across the board, costs of private schools fell on average (and the worst performing closed down) and equity improved since schools at baseline improved most.

Private schools in the above study may have responded to financial incentives since their pupils could vote with their feet. But the interesting thing is that public providers also improved.  This might reflect the pure power of comparative information. Alternatively, providers may have responded to the implicit threat to livelihoods from the availability of comparative data in the context of rapidly increasing provision of private education.

Another example of the availability of comparative data improving quality can be found in the UK catering industry, following the introduction of a scheme requiring restaurants and cafes to make their food hygiene inspection ratings publically available, and visible on doors and shop fronts. Customers preferred to buy food from outlets with higher ratings, and competition among food businesses on hygiene standards resulted in an increase in the proportion of food premises that complied with hygiene standards.[4]

What implications does this have for healthcare? Hospital performance seems to improve under the influence of league tables, even when reimbursement is not affected by the results. However, Fotaki says that the impact of initiatives such as NHS Choices, designed to provide public access to comparative data on hospital performance, consultant outcomes, and user satisfaction, may have little effect. She argues that the impact of informed choice on efficiency and quality is limited at best, and may even have negative consequences for equity: pre-existing inequalities of income and education influence patients’ access to information and ability to choose.[5] We would welcome comments on this enigma.

— Richard Lilford, CLAHRC WM Director


  1. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet. 2004;363(9415):1147-54.
  2. Girling AJ, Hofer TP, Wu J, Chilton PJ, Nicholl JP, Mohammed MA, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Quality & Safety. 2012;21:1052-6
  3. Andrabi T, Das J, Khawaja AI. Report cards: the impact of providing school and child test scores on educational markets. Social Science Research Network. RWP14-052. June 2014.
  4. Salis S, Jabin N, Morris S. Evaluation of the impact of the Food Hygiene Rating Scheme and the Food Hygiene Information Scheme on food hygiene standards and food-borne illnesses. Food Standards Agency. March 2015.
  5. Fotaki M. What market-based patient choice can’t do for the NHS: The theory and evidence of how choice works in health care. 2014.