Tag Archives: Yen-Fu Chen

Patient’s experience of hospital care at weekends

The “weekend effect”, whereby patients admitted to hospitals during weekends appear to be associated with higher mortality compared with patients who are admitted during weekdays, has received substantial attention from the health service community and the general public alike.[1] Evidence of the weekend effect was used to support the introduction of ‘7-day Service’ policy and associated changes to junior doctor’s contracting arrangement by the NHS,[2-4] which have further propelled debates surrounding the nature and causes of the weekend effect.

Members of the CLAHRC West Midlands are closely involved in the HiSLAC project,[5] which is an NIHR HS&DR Programme funded project led by Professor Julian Bion (University of Birmingham) to evaluate the impact of introducing 7-day consultant-led acute medical services. We are undertaking a systematic review of the weekend effect as part of the project,[6] and one of our challenges is to catch up with the rapidly growing literature fuelled by the public and political attention. Despite that hundreds of papers on this topic have been published, there has been a distinct gap in the academic literature – most of the published papers focus on comparing hospital mortality rates between weekends and weekdays, but virtually no study have compared quantitatively the experience and satisfaction of patients between weekends and weekdays. This was the case until we found a study recently published by Chris Graham of the Picker Institute, who has unique access to data not in the public domain, i.e. the dates of admission to hospital given by the respondents.[7]

This interesting study examined data from two nationwide surveys of acute hospitals in 2014 in England: the A&E department patient survey (with 39,320 respondents representing a 34% response rate) and the adult inpatient survey (with 59,083 respondents representing a 47% response rate). Patients admitted at weekends were less likely to respond compared to those admitted during weekdays, but this was accounted for by patient and admission characteristics (e.g. age groups). Contrary to the inference that would be made on care quality based on hospital mortality rates, respondents attending hospital A&E department during weekends actually reported better experiences with regard to ‘doctors and nurses’ and ‘care and treatment’ compared with those attending during weekdays. Patients who were admitted to hospital through A&E during weekends also rated information given to them in the A&E more favourably. No other significant differences in the reported patient experiences were observed between weekend and weekday A&E visits and hospital admissions. [7]

As always, some cautions are needed when interpreting these intriguing findings. First, as the author acknowledged, patients who died following the A&E visits/admissions were excluded from the surveys, and therefore their experiences were not captured. Second, although potential differences in case mix including age, sex, urgency of admission (elective or not), requirement of a proxy for completing the surveys and presence of long-term conditions were taken into account in the aforementioned findings, the statistical adjustment did not include important factors such as main diagnosis and disease severity which could confound patient experience. Readers may doubt whether these factors could overturn the finding. In that case the mechanisms by which weekend admission may lead to improved satisfaction Is unclear. It is possible that patients have different expectations in terms of hospital care that they receive by day of the week and consequently may rate the same level of care differently. The findings from this study are certainly a very valuable addition to the growing literature that starts to unfold the complexity behind the weekend effect, and are a further testament that measuring care quality based on mortality rates alone is unreliable and certainly insufficient, a point that has long been highlighted by the Director of the CLAHRC West Midlands and other colleagues.[8] [9] Our HiSLAC project continues to collect and examine qualitative,[10] quantitative,[5] [6] and economic [11] evidence related to this topic, so watch the space!

— Yen-Fu Chen, Principal Research Fellow


  1. Lilford RJ, Chen YF. The ubiquitous weekend effect: moving past proving it exists to clarifying what causes it. BMJ Qual Saf 2015;24(8):480-2.
  2. House of Commons. Oral answers to questions: Health. 2015. House of Commons, London.
  3. McKee M. The weekend effect: now you see it, now you don’t. BMJ 2016;353:i2750.
  4. NHS England. Seven day hospital services: the clinical case. 2017.
  5. Bion J, Aldridge CP, Girling A, et al. Two-epoch cross-sectional case record review protocol comparing quality of care of hospital emergency admissions at weekends versus weekdays. BMJ Open 2017;7:e018747.
  6. Chen YF, Boyal A, Sutton E, et al. The magnitude and mechanisms of the weekend effect in hospital admissions: A protocol for a mixed methods review incorporating a systematic review and framework synthesis. Systems Review 2016;5:84.
  7. Graham C. People’s experiences of hospital care on the weekend: secondary analysis of data from two national patient surveys. BMJ Qual Saf 2017;29:29.
  8. Girling AJ, Hofer TP, Wu J, et al. Case-mix adjusted hospital mortality is a poor proxy for preventable mortality: a modelling study. BMJ Qual Saf 2012;21(12):1052-56.
  9. Lilford R, Pronovost P. Using hospital mortality rates to judge hospital performance: a bad idea that just won’t go away. BMJ 2010;340:c2016.
  10. Tarrant C, Sutton E, Angell E, Aldridge CP, Boyal A, Bion J. The ‘weekend effect’ in acute medicine: a protocol for a team-based ethnography of weekend care for medical patients in acute hospital settings. BMJ Open 2017;7: e016755.
  11. Watson SI, Chen YF, Bion JF, Aldridge CP, Girling A, Lilford RJ. Protocol for the health economic evaluation of increasing the weekend specialist to patient ratio in hospitals in England. BMJ Open 2018:In press.

Systematic Reviewing in the Digital Era

In the field of systematic reviewing it is easy (and often necessary) to dip yourself deep into the sea of the literature and forget about all things that are going on in the outside world. Reflecting upon myself I realised that I hadn’t actually attended a proper Cochrane meeting even though I’ve been doing reviews for more than a decade. Before rendering myself truly obsolete, I decided to seize the opportunity when the Cochrane UK and Ireland Symposium came to Birmingham earlier in March to catch up with the latest development in the field. And I wasn’t disappointed.

A major challenge for people undertaking systematic reviews is to deal with the sheer number of potentially relevant papers against the timeline beyond which a review would be considered irrelevant. Indeed the issue is so prominent that we (colleagues in Warwick and Ottawa) have recently written and published a commentary to discuss ‘how to do a systematic review expeditiously’.[1] One of the most arduous processes in doing a systematic review is screening through the large number of records retrieved from search of bibliographical databases. Two years ago the bravest attempt that I heard of in a Campbell Collaboration Colloquium was sifting through over 40,000 records in a review. Two years on the number has gone up to over 70,000. While there is little sign that the number of published research papers is going to plateau in the future, I wonder how much reviewers’ stamina and patience can keep pace – even if they have the luxury of time to do it. Here comes the rescue of the clever computer. If Google’s AlphaGo can beat the human champion of Go games,[2] why cannot artificial intelligence saves reviewers from the humble but tedious task of screening articles?

Back to the symposium there is no shortage of signs of this digital revolution on the agenda. To begin with, the conference has no brochure or abstract book to pick up or print. All you get is a mobile phone app which tells you what the sessions are and where to go. Several plenary and workshop sessions were related to automation, which I was eager to attend and from which I learned of a growing literature on the use of automation throughout the review process,[3] including article sifting,[4] data extraction,[5] quality assessment [6] and report generation. Although most attempts were still exploratory, the use of text mining, classification algorithm and machine-learning to assist with citation screening appears to have matured sufficiently to be considered for practical application. The Abstrackr funded by AHRQ is an example that is currently freely available (registration required) and has been subject to independent evaluation.[7] Overall, existing studies suggest such software may potentially save reviewers’ workload in the range of 30-70% (by ruling out references unlikely to be relevant and hence don’t need to be screened) with a fairly high level of recall (missing 5% or less of eligible articles).[4] However this is likely to be subject-dependent and more empirical evidence will be required to demonstrate its practicality and limitations.

It is important to understand a bit more behind the “black box” machine when using such software, and so we were introduced to some online text mining and analysis tools during the workshop sessions. One example is “TerMine”, which allows you to put in some plain text or specify a text file or an URL. Within a few seconds or so it will return a list of text with most relevant terms highlighted (this can be viewed as a table ranked by relevance). I did a quick experimental analysis of the CLAHRC WM’s Director and Co-Director’s Blog, and the results seem to be a fair reflection of the themes: community health workers, public health, organisational failure, Cochrane reviews and service delivery were among the highest ranking terms (besides other frequent terms of CLAHRC WM and the Director’s name). The real challenge in using such tools, however, is how then to organise the identified terms in a sensible way (although there is other software around that is capable of doing things like semantic or cluster analysis), and perhaps more importantly, what important terms might be under-presented or absent.

Moving beyond systematic reviews, there are more ambitious developments such as the “Contentmine”, which is trying to “liberate 100 million facts from the scientific literature” using data mining techniques. Pending the support of more permissive copyright regulations and open access practice in scientific publishing, the software will be capable of automatically extracting data from virtually all available literature and then re-organise and present the contents (including texts and figures etc.) in a format specified by the users.

Finally, with all these exciting progresses around the world, Cochrane itself is certainly not lying idle. You might have seen its re-branded websites, but there are a lot more going on behind the scene: people who have used Review Manager (RevMan) can expect to see a “RevMan Web version” in the near future; the Cochrane Central Register of Controlled Trials (CENTRAL) is being enhanced by aforementioned automation techniques and will be complemented by a Cochrane Register of Study Data (CRS-D), which will make retrieval and use of data across reviews much easier (and thus facilitate further exploration of existing knowledge such as undertaking ‘multiple indication reviews’ advocated by the CLAHRC WM Director) [8]; there will also be a further enhanced Cochrane website with “PICO Annotator” and “PICOfinder” to help people locating relevant evidence more easily; and the Cochrane Colloquium will be replaced by an even larger conference which will bring together key players of systematic reviewing both within and beyond health care around the world. So watch the space!

— Yen-Fu Chen, Senior Research Fellow


  1. Tsertsvadze A, Chen Y-F, Moher D, Sutcliffe P, McCarthy N. How to conduct systematic reviews more expeditiously? Syst Rev. 2015; 4(1):1-6.
  2. Gibney E. What Google’s Winning Go Algorithm Will Do Next. Nature. 2016; 531: 284-5.
  3. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014; 3:74.
  4. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015; 4: 5.
  5. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015; 4: 78.
  6. Millard LA, Flach PA, Higgins JP. Machine learning to assist risk-of-bias assessments in systematic reviews. Int J Epidemiol. 2016; 45(1): 266-77.
  7. Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst Rev. 2015; 4: 80.
  8. Chen Y-F, Hemming K, Chilton PJ, Gupta KK, Altman DG, Lilford RJ. Scientific hypotheses can be tested by comparing the effects of one treatment over many diseases in a systematic review. J Clin Epidemiol. 2014; 67: 1309-19.



International comparison of trial results

Nearly two decades ago colleagues from the Cochrane Complementary and Alternative Medicine (CAM) Field found that a high proportion of trials originating from Eastern Asia or Eastern Europe tended to report positive results – nearly 100% in some cases.[1] This was not only for acupuncture trials but also for trials in other topics. More recently, a team led by Professor John Ioannidis, a renowned epidemiologist, conducted a meta-epidemiological study (a methodological study that examines and analyses data obtained from many systematic reviews/meta-analyses) and showed that treatment effects reported in trials conducted in less developed countries are generally larger compared with those reported in trials undertaken in more developed countries.[2] Many factors could have contributed to these observations, for example, publication bias, reporting bias, rigour of scientific conduct, difference in patient populations and disease characteristics, and genuine difference in intervention efficacy. While it is almost certain that the observation was not attributed to genuine difference in intervention efficacy alone, teasing out the influence of various factors is not an easy task. Lately, colleagues from CLAHRC WM have compared results from cardiovascular trials conducted in Europe with those conducted in North America, and did not find a convincing difference between them.[3] Perhaps the more interesting findings will come from the comparison between trials from Europe/America and those from Asia. The results? The paper is currently in press, so watch this space!

— Yen-Fu Chen, Senior Research Fellow


  1. Vickers A, Goyal N, Harland R, Rees R. Do certain countries produce only positive results? A systematic review of controlled trials. Controlled Clin Trials. 1998; 19(2): 159-66.
  2. Panagiotou OA, Contopoulos-Ioannidis DG, Ioannidis, JPA, Rehnborg CF. Comparative effect sizes in randomised trials from less developed and more developed countries: meta-epidemiological assessment. BMJ. 2013; 346: f707.
  3. Bowater RJ, Hartley LC, Lilford RJ. Are cardiovascular trial results systematically different between North America and Europe? A study based on intra-meta-analysis comparisons. Arch Cardiovasc Dis. 2015; 108(1):23-38.

From CAHRD to Campbell

After two days of an intensive consultation meeting at the Collaboration for Applied Health Research and Delivery (CAHRD), where the focus was on learning from stakeholders about the future direction of applied health research in low- and middle-income countries, I set off to Belfast to attend the 2014 Campbell Collaboration Colloquium. Having been a traditional ‘Cochraner’ for some time, it was a bizarre experience for me to meet so many people who, while doing the same type of work (systematic reviews) for the same purpose (informing policy and practice), are doing it in quite different contexts (education, crime and justice, and international development). It is somewhat like travelling to a different country in the modern world – you see people doing the same thing, such as going to a restaurant, but they have quite different menus and speak a different language.

Talking of language, one common issue that emerged from both meetings is terminology. In the CAHRD meeting we talked about the need for a standardised terminology in health service delivery research. As an example, the term “health system” means different things for different people and is often used when people want to describe something about health care, but know relatively little about it. In the Campbell conference I joined a session of the Knowledge Translation and Implementation (KTI) group where we were tasked with consolidating the definition of ‘knowledge translation’. The group leaders presented no less than 15 related terms (such as knowledge mobilisation and technical assistance) and identified 61 different frameworks or models of KTI through preliminary research. The tasks of resolving differences and reaching a consensus seem daunting.

While differences appear to be ubiquitous, many of them need not be a cause of concern so long as they do not lead to misunderstanding and ignorance. In the world of Campbell I soon got used to the term “moderator analysis,” which had only been known to me in the context of subgroup analysis and meta-regression for exploring potential sources of heterogeneity; and “impact evaluation for a development programme,” which appears somewhat similar to health technology assessments for new drugs, with which I am more familiar. I realised that although the names may be different and the techniques and emphasis may (quite rightly) vary to some extent to suit a different context, the principles are the same.

With my unease dissipated, I quickly started to enjoy exploring the new territory – as expected at such a conference there are many interesting things to be uncovered. For example, Professor Paul Connolly talked about how randomised controlled trials (RCTs) are depicted negatively in research methods textbooks as an unrealistic method advocated by positivists ignorant of the complex world of teaching and learning. He also detailed how the team at the Centre for Effective Education, based in the Queen’s University Belfast, have managed to conduct more than 30 RCTs in education settings since 2007. My recent task of sifting through nearly 10,000 records for a systematic review is easily dwarfed by the efforts of international colleagues who have trawled through over 60,000 records for a review of youth crime and violence. However, against the rather gloomy prospect of soon getting lost in the ever expanding sea of information, comes the welcome news that the Evidence for Policy and Practice Information and Co-ordinating (EPPI) Centre (a major player in the field of evidence synthesis in education and social policy) has developed smart software that utilises text mining and machine learning to automatically ‘prioritise’ references that are most likely to be relevant for a review based on the input of a few key words.

One of the most inspiring talks was delivered by Dr Howard White, who illustrated that the lack of permanent changes backed up by solid evidence has rendered education and social policy vulnerable to the influence of short-term political cycles. The example he quoted is the resurfacing of the debate on the merit of pay-for-performance based on exam results in school settings – an issue that was claimed to be resolved in a book concerning the education system in West Africa in the 1920s.

For people like me who have mainly been involved in evidence synthesis and evaluation in health care, but are curious about their application in the wider world, the International Initiative for Impact Evaluation (3ie), of which Dr White is the Executive Director, is well worth looking into. They are a US-based, not-for-profit organisation that commissions and carries out in-house systematic reviews and impact evaluations of development programmes for developing countries. They have offices in Washington, New Delhi and London and have commissioned or carried out more than 130 impact evaluations and 30 systematic reviews since 2009. Topics have been diverse, ranging from the more familiar, such as a systematic review of community-based intervention packages for reducing maternal morbidity and mortality and improving neonatal outcomes, to the less familiar, for example, impact evaluation of export processing zones on employment, wages and labour conditions in developing countries. All reports are available from their website, which also includes a wealth of other resources such as evidence ‘Gap Maps’, methodological working papers, a prospective registry for international development impact evaluations, and a searchable database of evaluation experts.

My final reflections upon the journey through both meetings is that to achieve the common aspiration of evidence-informed policy and practice, we need to break any boundary of disciplines and ideologies; and understand and embrace differences rather than exclude or ignore them, so that the diverse strength from individual persons and organisations can be harvested to the greatest extent to expedite the progress. Perhaps science has its own cycles, just like politics, and after a period of phenomenal advances in increasingly divided subject areas, the time has come to focus on how to integrate and synergise specialised knowledge.

Yen-Fu Chen with Martina Vojtkova, Evaluation Specialist from the 3ie, at the Campbell Collaboration Colloquium 2014.
Yen-Fu Chen with Martina Vojtkova, Evaluation Specialist from the 3ie, at the Campbell Collaboration Colloquium 2014.

— Yen-Fu Chen, Senior Research Fellow