Tag Archives: Guest blog

Publishing Health Economic Models

It has increasingly become de rigueur – if not necessary – to publish the primary data collected as part of clinical trials and other research endeavours. In 2015 for example, the British Medical Journal stipulated that a pre-condition of publication of all clinical trials was the guarantee to make anonymised patient-level data available on reasonable request.[1] Data repositories, from which data can be requested such as the Yoda Project, and from which data can be directly downloaded such as Data Dryad provide a critical service for researchers wanting to make their data available and transparent. The UK Data Service also provides access to an extensive range of quantitative and, more recently, qualitative data from studies focusing on matters relating to society, economics and populations. Publishing data enables others to replicate and verify (or otherwise) original findings and, potentially, to answer additional research questions and add to knowledge in a particularly cost-effective manner.

At present, there is no requirement for health economic models to be published. The ISPOR-SMDM Good Research Practices Statement advocates publishing of sufficient information to meet their goals of transparency and validation.[2] In terms of transparency, the Statement notes that this should include sufficiently detailed documentation “to enable those with the necessary expertise and resources to reproduce the model”. The need to publish the model itself is specifically refuted, using the following justification: “Building a model can require a significant investment in time and money; if those who make such investments had to give their models away without restriction, the incentives and resources to build and maintain complex models could disappear”. This justification may be relatively hard to defend for “single-use” models that are not intended to be reused. Although the benefits of doing so are limited, publishing such models would still be useful if a decision-maker facing a different cost structure wanted to evaluate the cost-effectiveness of a specific intervention in their own context. The publication of any economic model would also allow for external validation which would likely be stronger than internal validation (which could be considered marking one’s own homework).

The most significant benefits of publication are most likely to arise from the publication of “general” or “multi-application” models because those seeking to adapt, expand or develop the original model would not have to build it from scratch, saving time and money (recognising this process would be facilitated by the publication of the technical documentation from the original model). Yet it is for these models that not publishing gives developers a competitive advantage in any further funding bids in which a similar model is required. This confers partial monopoly status in a world where winning grant income is becoming ever more critical. However, I like to believe most researchers also want to maximise the health and wellbeing of society: am aim rarely achieved by monopolies. The argument for publication gets stronger when society has paid (via taxation) for the development of the original model. It is also possible that the development team benefit from publication through increased citations and even the now much sought after impact. For example, the QRISK2 calculator used to predict cardiovascular risk is available online and its companion paper [3] has earned Julia Hippisley-Cox and colleagues almost 700 citations.

Some examples of published economic models exist, such as a costing model for selection processes for speciality training in the UK. While publication of more – if not all – economic models is not an unrealistic aim, it is also necessary to respect intellectual property rights. We welcome your views on whether existing good practice for transparency in health economic modelling should be extended to include the model itself.

— Celia Taylor, Associate Professor


  1. Loder E, & Groves T. The BMJ requires data sharing on request for all trials. BMJ. 2015; 350: h2373.
  2. Eddy DM, Hollingworth W, Caro JJ, et al. Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force–7. Med Decis Making. 2012; 32(5): 733-43.
  3. Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ. 2008; 336(7659): 1475-82.

Sustainability and Transformation Plans in the English NHS

Sustainability and Transformation Plans (STPs) are the latest in a long line of approaches to strategic health care planning over a large population footprint. These latest iterations were based on a one million plus population, looked at a five year timescale, were led by local partners (often acute trusts, but sometimes, as in Birmingham and Solihull, by the Local Authority), and focused inevitably on financial pressures. The plans were published in December 2016 and now the challenge to the STP communities is further refinement of the plans and, of course, implementation.

The Health Service Journal (HSJ) reviewed the content of the STPs in November 2016 and highlighted three common and unsurprising areas of focus: further development of community based approaches to care (notably aligned to the New Models of Care discussed in the CLAHRC WM News Blog of 27 January; see also https://www.england.nhs.uk/ourwork/new-care-models/); reconfiguration of secondary and tertiary services; and sharing of back office and clinical support functions. More interestingly, the HSJ noted an absence of focus on social care, patient/ clinical/ wider stakeholder engagement and on prevention and wellbeing.

The King’s Fund has produced two reviews of how STPS have developed in November 2016 and February 2017. These have been based on interviews with the same sub set of leaders , as well as other analyses. Both have reached similar conclusions. Recommendations have included the need to: increase involvement of wider stakeholders; strengthen governance and accountability arrangements and leadership ( including full time teams ) to support implementation; support longer term transformation with money, e.g. new models of care, not just short term financial sustainability; stress-test assumptions and timescales to ensure they are credible and deliverable, then communicate with local populations about their implementation honestly; and finally, align national support behind their delivery, e.g. support, regulation, performance management and procurement guidance.

A specific recommendation relates to the need to ensure robust community alternatives are in place before hospital bed numbers are reduced. The service has received strong guidance about this latter point from NHS England in the last few weeks. Various other Thinktanks have also produced more or less hopeful commentaries on STPs, such as Reform, The Centre for Health and Public Interest and the IPPR; they all say they cannot be ignored.

Already, in March 2017, the context is shifting: yet again, ‘winter pressures’ have been high profile and require a NHS response; the scale of the social care crisis has become even more prominent; there is a national push to accelerate and support change in primary care provision.

Furthermore, the role of CCG is changing in response: some are merging to create bigger population bases which may or may not be the same as STP geography; some GP leaders are moving into the new primary care provider organisations; the majority of CCGs will be ‘doing their own’ primary care commissioning for the first time just as the pace of primary care change is increasing; some commissioning functions may shift to new care models such as accountable care arrangements. It is clear that for some geographies and services the STP approach could work, but more local and more national responses to specific services and in specific places will continue to be needed. All these issues will influence how the STPs play out in the local context.

— Denise McLellan

The Evolving Role of the CLAHRC in the Use of Evidence to Improve Care and Outcomes in Service Settings

If we are to use public funds to support research, there is an assumption that the outcome of that research will be improvements to the service. This exchange, however is problematic. CLAHRCs are set up to address this interface in a particular way, namely to evaluate service interventions. As well as generating new knowledge for the system, there is a wider aspiration of building a system-wide ‘habit’ of using evidence to drive service change and evaluating the output.

As part of the consideration of how CLAHRC West Midlands evolves, we would like to hear readers’ views as to how well it has done and what it should do in the future.

The use of evidence to improve practice in service settings has demand and supply side factors. The service has to want to use evidence, be supported to use evidence, and have the capacity to make changes in response. On the research ‘supply’ side, there has to be a suitable body of existing evidence, researchers have to have the skills and capacity to develop suitable research methods and to convey the outcomes in a usable form.

Even if all these factors co-exist, barriers, such as changed external environments, resistance to change, and timing issues, can thwart the exchange.

CLAHRC WM has tried to address this in a number of ways. It has created new roles:

  • Embedded posts: academic researchers jointly funded by service and research institutions, working on agreed projects within a service setting
  • Diffusion fellows: experienced practitioners supported to undertake research in a service area.

Patients and the public are central to driving the direction of research: their involvement at all stages of the research cycle means that topics are relevant to them and meet their needs. In addition, CLAHRC WM has employed a range of dissemination methods, both traditional and innovative, to share research findings. These include publishing summaries of evaluations completed, running workshops and, indeed, regular publication of this articles in this blog.

Service evaluation is not the only form of research being undertaken within service institutions, nor is CLAHRC WM the only source of evaluation support. With the current focus on integration, there is a question as to how CLAHRC WM could be better integrated within the service’s own research and development strategies. However, one has to be mindful that the budget for CLAHRC WM is tiny compared to the billions of health care spent in the West Midlands each year and therefore it has to take care to target its resources.

In future blogs we will look more closely at some of these issues, with interviews with those occupying embedded/ diffusion roles. Meanwhile, we would welcome your views and thoughts as to how CLAHRC WM should evolve in this regard, so please comment or get in touch; it would be much appreciated.

— Denise McLellan

It’s Never Too Early: Policy Implications From Early Intervention in Youth Mental Health

Two pieces of news may have escaped your attention in recent months: the first was that in the post-Brexit cabinet re-organisations, the Secretary of State for Health, Jeremy Hunt, picked up the responsibility for mental health, which had previously been separated from the health portfolio. This resulted in barely a mention in the mainstream media and has not resulted in any perceptible changes in policy… yet.

The second piece of news was last week, and featured prominently in the Health Services Journal, but to my surprise seemed to make very little impact in the national news. Jeremy Hunt described children’s mental health services as “the biggest single area of weakness in NHS provision at present”. When you stop to consider the breadth and depth of challenges facing the NHS at present, to single out this oft overlooked area so starkly came as a surprise, albeit a welcome one.

Of course, bold statements are one thing and actions another, but there seemed to already be early seeds of policy initiatives creeping in to the detail of the statement, along with the suggestion that this was a particular area of concern for the Prime Minister Theresa May. The statement highlighted the need for early intervention for children with mental health problems and suggested closer working between Child and Adolescent Mental Health Services (CAMHS) and schools, as well as the challenges that exist within the 16-24 year old age group and the need to address this gap in service for particular conditions. Interestingly, some of these issues have also been brought to the fore in policy documents issued by the Clinton Presidential campaign in the United States.

All this bodes well for the Youth Mental Health theme of CLAHRC West Midlands. CLAHRC researchers though both this CLAHRC and the previous incarnation of CLAHRC Birmingham and Black Country have worked on a variety of projects whose research could help provide an evidence base for policy formulation. These include the redesign of youth mental health services to improve access; early intervention in first episode psychosis; the impact of schools on mental health (see also youthspace.me), and interventions within the 0-25 age range.

Professor Max Birchwood, Theme lead for the Youth Mental Health theme commented “It’s great to see this important area of health receiving national attention and mirroring many elements of the research undertaken by CLAHRC BBC and now CLAHRC WM. We look forward to playing an active role in contributing to the discussion and helping to shape future guidance and policy in this area”.

— Paul Bird, CLAHRC WM Head of Programme Delivery (Engagement)

Researchers Continue to Consistently Misinterpret p-values

For as long as there have been p-values there have been people misunderstanding p-values. Their nuanced definition eludes many researchers, statisticians included, and so they end up being misused and misinterpreted. The situation recently prompted the American Statistical Association (ASA)  to produce a statement on p-values.[1] Yet, they are still widely viewed as the most important bit of information in an empirical study, and careers are still built on ‘statistically significant’ findings. A paper in Management Science,[2] recently reported on Andrew Gelman’s blog,[3] reports the results of a number of surveys of top academics about their interpretations of the results of hypothetical studies. They show that these researchers, who include authors in the New England Journal of Medicine and American Economic Review, generally only consider whether the p-value is above or below 0.05; they consider p-values even when they are not relevant; they ignore the actual magnitude of an effect; and they use p-values to make inferences about the effect of an intervention on future subjects. Interestingly, the statistically untrained were less likely to make the same errors of judgement.

As the ASA statement and many, many other reports emphasise, p-values do not indicate the ‘truth’ of a result, nor do they imply clinical or economic significance, they are often presented for tests that are completely pointless, and they cannot be interpreted in isolation of all the other information about the statistical model and possible data analyses. It is possible that in the future the p-value will be relegated to a subsidiary statistic where it belongs rather than the main result, but until that time statistical education clearly needs to improve.

— Sam Watson, Research Fellow


  1. Wasserstein RL & Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016; 70(2). [ePub].
  2. McShane BM & Gal D. Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Manage Sci., 2015; 62(6): 1707-18.
  3. Gelman A. More evidence that even top researchers routinely misinterpret p-values. Statistical Modeling, Causal Inference, and Social Science. 26 July 2016.


Bias is something that affects us all; its all-pervasive nature means it influences everything we do from supermarket choices through to study design. But what of perception? Bias might lead the result of our study to be skewed or flawed, but if our perception of where the issue lies is incorrect we may select the wrong thing to study. So, how wrong can our perception be? Well, very, according to IPSOS MORI and their annual Perils of Perception report for 2015.[1]

This report polls members of the public in 33 countries on their understanding of key issues affecting their nation. The results show a significant gap between perception and reality across a number of issues that specifically relate to society and health in Great Britain.

Our perception of the distribution of wealth was one of the most distorted views. When asked what proportion of wealth the top 1% of the population own the guess was 59%, more than twice the true figure, which is 23%.

On immigration the perception of Britons is that 25% of the population are immigrants, nearly double the actual figure of 13%.

We also know we have an ageing population, but perhaps not to the extent we believe. The estimate of the average age of the population was 51 years old, when it is in fact 40.

With regard to obesity we may be complacent; the average estimate of the proportion of people over the age of 20 who are overweight or obese was 44% when it is in fact 62%.

And before you think that the Great British Public are better or worse than elsewhere, we are not. Ranked 16th of 33 countries in IPSOS MORI’s provocatively titled “Index of Ignorance” we are firmly in mid-table. If you are reading the blog from either Ireland or South Korea (ranked 27th and 28th) you potentially have a better perception of issues affecting your nation than if in Mexico, India and Brazil who occupy the top 3 positions. But this is not an issue that is delineated along boundaries of low-, middle- or high-income countries in case you were to infer that from the results: New Zealand is ranked at number 5 and Belgium at number 7.

So this is all good fun, interesting stuff, but what does it mean? Well certainly not that we should quietly reassure ourselves that we would have been much closer to the real figure than most of the population. Bias and perception issues are at their most insidious when we fail to acknowledge that we may be subject to them.

These findings are in fact an endorsement of the way, as CLAHRCs, we structure what we do. By bringing together academics, patients and those involved in delivering care, we challenge each others perceptions of the issues related to service delivery. That way we can work collaboratively to solve that issues that are in fact real issues, rather than those which we perceive to be the issue.

— Paul Bird, CLAHRC WM Head of Programme Delivery (Engagement)


  1. Ipsos MORI. Perils of Perception 2015. 2015.

Do they think we’re stupid? The rise of statistical manipulitis and preventable measures

If there is one thing that the campaigns on the EU Referendum have taught us, it’s how the same set of data can be used to generate statistics that support two completely opposing points of view. This is beautifully illustrated in a report in the Guardian newspaper.[1] While the research community (amongst others) might accuse the campaigners of misleading the public and lament the journalists who sensationalise our findings, we are not immune from statistical manipulitus. To help control the susceptibility of researchers to statistical manipulitis, compulsory registration of trial protocols had to be instigated,[2] but five years later the majority of studies failed to do so, even registered trials where reporting results within one year of trial completion was mandated.[3] Furthermore, reporting alone provides insufficient public protection against the symptoms of statistical manipulitis. As highlighted in a previous blog, and one of Ben Goldacre’s Bad Science blogs,[4] researchers have been known to change primary endpoints, or select which endpoints to report. To provide a full aetiology for statistical manipulitis is beyond the scope of this blog, although Maslow’s belief that esteem (incorporating achievement, status, dominance and prestige) precedes self-actualisation (incorporating the realisation of one’s actual personal potential) provides an interesting starting point.[5] Whatever the causative mechanism, statistical manipulitis is not the only adverse consequence. For example, some professional athletes may stretch the principles underlying Therapeutic Use Exemptions to enable them to legally use substances on the World Anti-Doping Agency’s banned list, such as testosterone-based creams to treat saddle-soreness, when not all physicians would consider the athlete’s symptoms sufficiently severe to justify their use.[6]

We can also think of statistical manipulitis as pushing its victims across a balanced scale to the point at which the statistics presented become too contrived to be believed. Which side in the EU Referendum debate has travelled further from equilibrium is a moot point. While important gains could be had if those engaged with the debate knew the point at which the public’s scale is balanced, watching them succumb has injected some much-needed entertainment. The increased awareness of statistical manipulitis resulting from the debate has also provided an open door for those involved with public engagement with science to help move that tipping point and reduce the expected value of manipulation. To do so, the public need the tools and confidence to ask questions about political, scientific and other claims, as now being facilitated by the work of CLAHRC WM’s new PPIE Lead, Magdalena Skrybant, in her series entitled Method Matters. The first instalment, on regression to the mean, is featured in this blog.

Method Matters are ‘bite size’ explanations to help anyone without a degree in statistics or experience in research methods make sense of the numbers and claims that are bandied about in the media, using examples taken from real life written. Certainly, we would hope that through Method Matters, more people would be able to accurately diagnose any cases of statistical manipulitis and take relevant precautions.

Writing Method Matters is not an easy task: if each student in my maths class had rated my explanation of each topic, those ratings would vary both within and between students. My challenge was how to maximise the number of students leaving the class uttering those five golden words: “I get it now Miss!” Magdalena faces a tougher challenge – one size does not fit all and, unlike a “live” lesson, she cannot offer multiple explanations or answer questions in real time. However, while I had to convince 30 14-year-olds of the value of trigonometry on a windy Friday afternoon, the epidemic of statistical manipulitis highlighted by the EU Referendum debate has provided fertile ground for Method Matters. Please let us know what you think.

— Celia Taylor, Associate Professor


  1. Duncan P, Gutiérrez P, Clarke S. Brexit: how can the same statistics be read so differently? The Guardian. 3 June 2016.
  2. Abbasi K. Compulsory registration of clinical trials. BMJ. 2004; 329: 637.
  3. Prayle AP, Hurley MN, Smyth AR. Compliance with mandatory reporting of clinical trial results on ClinicalTrials.gov: cross sectional study. BMJ. 2012; 344: d7373.
  4. Goldacre B. The data belongs to the patients who gave it to you. Bad Science. 2008.
  5. McLeod S. Maslow’s Hierarchy of Needs. Simply Psychology. 2007.
  6. Bassindale T. TUE – Therapeutic Use Exemptions or legitimised drug taking? We Are Forensic. 2014.

Systematic Reviewing in the Digital Era

In the field of systematic reviewing it is easy (and often necessary) to dip yourself deep into the sea of the literature and forget about all things that are going on in the outside world. Reflecting upon myself I realised that I hadn’t actually attended a proper Cochrane meeting even though I’ve been doing reviews for more than a decade. Before rendering myself truly obsolete, I decided to seize the opportunity when the Cochrane UK and Ireland Symposium came to Birmingham earlier in March to catch up with the latest development in the field. And I wasn’t disappointed.

A major challenge for people undertaking systematic reviews is to deal with the sheer number of potentially relevant papers against the timeline beyond which a review would be considered irrelevant. Indeed the issue is so prominent that we (colleagues in Warwick and Ottawa) have recently written and published a commentary to discuss ‘how to do a systematic review expeditiously’.[1] One of the most arduous processes in doing a systematic review is screening through the large number of records retrieved from search of bibliographical databases. Two years ago the bravest attempt that I heard of in a Campbell Collaboration Colloquium was sifting through over 40,000 records in a review. Two years on the number has gone up to over 70,000. While there is little sign that the number of published research papers is going to plateau in the future, I wonder how much reviewers’ stamina and patience can keep pace – even if they have the luxury of time to do it. Here comes the rescue of the clever computer. If Google’s AlphaGo can beat the human champion of Go games,[2] why cannot artificial intelligence saves reviewers from the humble but tedious task of screening articles?

Back to the symposium there is no shortage of signs of this digital revolution on the agenda. To begin with, the conference has no brochure or abstract book to pick up or print. All you get is a mobile phone app which tells you what the sessions are and where to go. Several plenary and workshop sessions were related to automation, which I was eager to attend and from which I learned of a growing literature on the use of automation throughout the review process,[3] including article sifting,[4] data extraction,[5] quality assessment [6] and report generation. Although most attempts were still exploratory, the use of text mining, classification algorithm and machine-learning to assist with citation screening appears to have matured sufficiently to be considered for practical application. The Abstrackr funded by AHRQ is an example that is currently freely available (registration required) and has been subject to independent evaluation.[7] Overall, existing studies suggest such software may potentially save reviewers’ workload in the range of 30-70% (by ruling out references unlikely to be relevant and hence don’t need to be screened) with a fairly high level of recall (missing 5% or less of eligible articles).[4] However this is likely to be subject-dependent and more empirical evidence will be required to demonstrate its practicality and limitations.

It is important to understand a bit more behind the “black box” machine when using such software, and so we were introduced to some online text mining and analysis tools during the workshop sessions. One example is “TerMine”, which allows you to put in some plain text or specify a text file or an URL. Within a few seconds or so it will return a list of text with most relevant terms highlighted (this can be viewed as a table ranked by relevance). I did a quick experimental analysis of the CLAHRC WM’s Director and Co-Director’s Blog, and the results seem to be a fair reflection of the themes: community health workers, public health, organisational failure, Cochrane reviews and service delivery were among the highest ranking terms (besides other frequent terms of CLAHRC WM and the Director’s name). The real challenge in using such tools, however, is how then to organise the identified terms in a sensible way (although there is other software around that is capable of doing things like semantic or cluster analysis), and perhaps more importantly, what important terms might be under-presented or absent.

Moving beyond systematic reviews, there are more ambitious developments such as the “Contentmine”, which is trying to “liberate 100 million facts from the scientific literature” using data mining techniques. Pending the support of more permissive copyright regulations and open access practice in scientific publishing, the software will be capable of automatically extracting data from virtually all available literature and then re-organise and present the contents (including texts and figures etc.) in a format specified by the users.

Finally, with all these exciting progresses around the world, Cochrane itself is certainly not lying idle. You might have seen its re-branded websites, but there are a lot more going on behind the scene: people who have used Review Manager (RevMan) can expect to see a “RevMan Web version” in the near future; the Cochrane Central Register of Controlled Trials (CENTRAL) is being enhanced by aforementioned automation techniques and will be complemented by a Cochrane Register of Study Data (CRS-D), which will make retrieval and use of data across reviews much easier (and thus facilitate further exploration of existing knowledge such as undertaking ‘multiple indication reviews’ advocated by the CLAHRC WM Director) [8]; there will also be a further enhanced Cochrane website with “PICO Annotator” and “PICOfinder” to help people locating relevant evidence more easily; and the Cochrane Colloquium will be replaced by an even larger conference which will bring together key players of systematic reviewing both within and beyond health care around the world. So watch the space!

— Yen-Fu Chen, Senior Research Fellow


  1. Tsertsvadze A, Chen Y-F, Moher D, Sutcliffe P, McCarthy N. How to conduct systematic reviews more expeditiously? Syst Rev. 2015; 4(1):1-6.
  2. Gibney E. What Google’s Winning Go Algorithm Will Do Next. Nature. 2016; 531: 284-5.
  3. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014; 3:74.
  4. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015; 4: 5.
  5. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015; 4: 78.
  6. Millard LA, Flach PA, Higgins JP. Machine learning to assist risk-of-bias assessments in systematic reviews. Int J Epidemiol. 2016; 45(1): 266-77.
  7. Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst Rev. 2015; 4: 80.
  8. Chen Y-F, Hemming K, Chilton PJ, Gupta KK, Altman DG, Lilford RJ. Scientific hypotheses can be tested by comparing the effects of one treatment over many diseases in a systematic review. J Clin Epidemiol. 2014; 67: 1309-19.



Do we Need ‘Situations’ to Make a Situational Judgement Test?

Rank the following options in order of their likely effectiveness or the extent to which they reflect ideal behaviour in a work situation.

  1. Make a list of the patients under your care on the acute assessment unit, detailing their outstanding issues, leaving this on the doctor’s office notice board when your shift ends and then leave at the end of your shift.
  2. Quickly go around each of the patients on the acute assessment unit, leaving an entry in the notes highlighting the major outstanding issues relating to each patient and then leave at the end of your shift.
  3. Make a list of patients and outstanding investigations to give to your colleague as soon as she arrives.
  4. Ask your registrar if you can leave a list of your patients and their outstanding issues with him to give to your colleague when she arrives and then leave at the end of your shift.
  5. Leave a message for your partner explaining that you will be 30 minutes late.

053 GB - SJT Doctor

How would your ranking change if you knew the following about the situation?

You are just finishing a busy shift on the Acute Assessment Unit (AAU). Your FY1 colleague who is due to replace you for the evening shift leaves a message with the nurse in charge that she will be 15 to 30 minutes late. There is only a 30 minute overlap between your timetables to handover to your colleague. You need to leave on time as you have a social engagement to attend with your partner.

(Example from UKFPO SJT Practice Paper © MSC Assessment 2014, reproduced with permission.)

The use of situational judgement tests (SJTs) for selection into education, training and employment has proliferated in recent years, but there remains an absence of theory to explain why they may be predictive of subsequent performance.[1] The name suggests that the tests are an assessment of a candidate’s ability to make a judgement about the most appropriate action in challenging work-related situations; suggesting that the tests must include descriptions of such challenging work-related situations. But your ranking of the possible actions listed above probably did not change much (if at all) once you knew the exact details of the situation compared to when these had to be deduced from the possible actions listed. A similar finding was recently reported in a fascinating experiment conducted by Krumm and colleagues,[2] with volunteers randomised to complete a teamwork SJT with or without situation descriptions. Those given the situation descriptions scored, on average, just 8.5% higher than those not given the descriptions. Of course, consideration of the need for a situation description is only possible for SJTs in a format where possible actions are presented to candidates (commonly known as multiple choice), but this format is generally used in practice as it facilitates marking and scoring.

Krumm et al.’s findings clearly raise doubts as to the intended construct of the test (i.e. the candidate’s judgement of specific situations); yet SJTs are predictive of workplace performance, with correlations of around 0.30 reported in meta-analyses (see for example McDaniel et al.).[3] So if a SJT doesn’t actually require a “situation” to enable a useful assessment of a candidate’s likely future performance, then what exactly is the assessment of? Lievens and Motowildo [4] suggest that it is of general domain knowledge regarding the utility of expressing certain traits, such as agreeableness, based on the knowledge that such traits help to ensure effective workplace importance. The implication of this theory for practice is that SJTs may not need to be particularly specific and could therefore be shared across professions and geographical boundaries, making them a particularly cost-effective selection tool. The implication for research is that we need more evidence on the antecedents of general domain knowledge, such as family background, both as part of theoretical development and to evaluate the fairness of SJTs for selection.

And what if one does actually desire an assessment of situational judgement as opposed to general domain knowledge, since both have independent predictive validity for job performance? Rockstuhl and colleagues suggest that candidates need to be asked for an explicit, open-ended judgement of the situation (e.g. “what are the thoughts, feelings and ideas of the people in the situation?”) rather than what they think is the most appropriate response to it.[5] The nub here is whether including open-ended assessments to enable measurement of situational judgement is cost-effective given their incremental validity over general domain knowledge and the cost of marking responses (with at least two markers required). For the moment we simply note that a rather large envelope would be required for even a rapid assessment of selection utility!

— Celia Taylor, Senior Lecturer


  1. Campion MC, Ployhart RE, MacKenzie Jr WI. The state of research on situational judgment tests: a content analysis and directions for future research. Hum Perform. 2014; 27(4): 283-310.
  2. Krumm S, Lievens F, Hüffmeier J, et al. How “situational” is judgment in situational judgment tests? J Appl Psychol. 2015; 100(2): 399-416.
  3. McDaniel MA, Hartman NS, Whetzel DL, Grubb III WL. Situational judgment tests, response instructions, and validity: a meta‐analysis. Pers Psychol. 2007; 60(1): 63-91.
  4. Lievens F, & Motowidlo SJ. Situational judgment tests: From measures of situational judgment to measures of general domain knowledge. Ind Organ Psychol. 2016: 9(1): 3-22.
  5. Rockstuhl T, Ang S, Ng KY, Lievens F, Van Dyne L. Putting judging situations into situational judgment tests: Evidence from intercultural multimedia SJTs. J Appl Psychol. 2015; 100(2): 464-80.


Watching NoCounter interact with “Aunty” Martha (not their real names) in Mahwaqe, South Africa, and learning about NoCounter’s roles as Martha’s health advocate, personal trainer and medication manager was anything but dismal. So as a dismal scientist, I was fascinated by how Community Health Workers (CHWs) seem to contradict one of our most famous founders, Adam Smith. To help explain one of the concepts for which he would become famous, “the invisible hand”, Smith wrote: “I have never known much good done by those who affected to trade for the public good”.[1]

To consider whether NoCounter and other CHWs are an exception to this statement, there are three questions that need to be considered:

Is the CHW doing good?
Almost all of the available research evidence suggests that CHWs are effective in enhancing the health of their communities,[2] and since the World Health Organization also see CHWs as playing a pivotal role in helping countries achieve health-related Millennium Development Goals,[3] it is most likely that CHWs are “doing good”. In Mahwaqe, we saw how NoCounter helped Martha do the chair yoga exercises that now mean she can walk and explained her medications, which helped Martha understand the importance of adherence.

Is the CHW trading?
NoCounter is giving up her time (working around 50% FTE) and in return, receives a stipend from an NGO of around R800 (~£36) per month and as such, is trading. However, as a maid in South Africa, she could earn around R1,200 (~£54) per month for the same hours, so NoCounter does not seem to be receiving the full monetary value of her time. If approximate role equivalence can be assumed, compared to a CHW in the US, NoCounter’s time is undervalued by a factor of around 8.5: a US CHW working for an hour could buy 3.3 McDonald’s Big Macs; NoCounter could buy 0.4.[4] [5] NoCounter is also using her skills and experience to provide care, but economics would describe these as “non-rivalrous” and thus not directly tradable.

Is the CHW doing so for the public good or her own self-interest?
Adam Smith might be confused by NoCounter, because her aim doesn’t seem to be wealth maximisation. However, a “utility maximising” economist would argue that NoCounter is making up for not being paid the full monetary value of her time by obtaining utility either from substitutes for money or from directly helping her community.[6] Even if NoCounter obtains utility from the latter, her motivation would still be to do public good. With regards to money substitutes, CHWs may also receive non-monetary incentives such as community respect, housing and access to health care and/or be motivated in their roles via the support of their families.[6] [7] Furthermore, the CHW role is particularly desirable in areas where residents have a high marginal rate of substitution for leisure over consumption, since CHWs do not have to commute to their place of work. Finally, a by-product of NoCounter’s work as a CHW from which she benefits directly is that she lives in a healthier community: by encouraging vaccination of new-borns, for example, she is lowering her own risk of TB.

On this last question, the relative importance of the different reasons why CHWs undertake their role for a wage lower than they appear to be worth, we cannot be certain about the answer. Research in this area is critical given the push to eliminate the under-supply of CHWs.[8] There are also additional pre-conditions – the organisational structure required to implement a successful CHW programme [9] – that also must be met before the demand for CHWs can be realised (made “effective”) in practice. Nevertheless, it is critical to determine whether all of the additional CHWs required to meet demand would also offer their labour at a low relative price. This was assumed in a costing exercise of a CHW roll-out programme,[10] but which prima facie contradicts basic economic theory of demand and supply.

Fortunately for me, economics provides one approach to studying the interaction between monetary and non-monetary incentives with respect to the supply of labour, for example using discrete choice experiments, where CHWs would be asked to make a choice between a series of pairs of packages of stipend/salary, level of health produced, and non-monetary incentives (see [11] for an example). Such experiments would need to be repeated in (and possibly also within) different countries, since the relative value of “doing good” by volunteering may well differ according to a country’s stage in economic development. Such work would help to provide evidence regarding the sustainability of CHWs as a cadre of health care providers. Here, we hypothesise a U-shaped curve if propensity to volunteer is plotted against GDP per capita

— Celia Taylor, Senior Lecturer


  1. Smith A. An Inquiry into the Nature and Causes of the Wealth of Nations. London: Strahan and Cadell, 1776.
  2. Perry H, Zulliger R. How Effective are Community Health Workers? An Overview of Current Evidence with Recommendations for Strengthening Community Health Worker Programs to Accelerate Progress in Achieving the Health-related Millennium Development Goals. Baltimore, MD: John Hopkins Bloomberg School of Public Health, 2012.
  3. World Health Organization and Global Health Workforce Alliance. Global Consultation on Community Health Workers. Geneva, Switzerland: World Health Organization, 2010.
  4. Payscale Homepage. 2015.
  5. The Economist. The Big Mac Index. 2015.
  6. Greenspan JA, McMahon SA, Chebet JJ, Mpunga M, Urassa DP, Winch PJ. Sources of community health worker motivation: a qualitative study in Morogoro Region, Tanzania. Hum Resour Health. 2013; 11: 52.
  7. Dambisya YM. A review of non-financial incentives for health worker retention in east and southern Africa. In: EQUINET Discussion Paper Number 44 with ESCA-HC. Loewenson R (Editor). Harare, Zimbabwe: EQUINET, 2007.
  8. One Million Community Health Workers Campaign. One Million Community Health Workers Campaign. 2015.
  9. World Health Organization, Policy Brief. Community health workers: What do we know about them? Geneva, Switzerland: World Health Organization, 2007
  10. McCord GC, Liu A, Singh P. Deployment of community health workers across rural sub-Saharan Africa: financial considerations and operational assumptions. Bull World Health Organ. 2012; 91(4):244-53B.
  11. Kasteng F, Settumba S, Källander K, Vassall, A, inSCALE Study Group. Valuing the work of unpaid community health workers and exploring the incentives to volunteering in rural Africa. Health Policy Plan. 2016: 31(2): 205-16.