Interim Guidelines for Studies of the Uptake of New Knowledge Based on Routinely Collected Data

CLAHRC West Midlands and CLAHRC East Midlands use Hospital Episode Statistics (HES) to track the effect of new knowledge from effectiveness studies on implementation of the findings from those studies. Acting on behalf of CLAHRCs we have studied uptake of findings from the HTA programme over a five year period (2011-15). We use the HES database to track uptake of study treatments where the use of that treatment is recorded on the HES database – most often these are studies of surgical procedures. We conduct time series analyses to examine the relationship between publication of apparently clear-cut findings and the implementation (or not) of those findings. We have encountered some bear traps in this apparently simple task, which must be carried out with an eye to detail. Our work is ongoing, but here we alert practitioners to some things to look out for based on the literature and our experience. First, note that the use of time series to study clinical practice based on routine data is both similar and different from the use of control charts in statistical process control. For the latter purpose, News Blog readers are referred to the American National Standard (2018).[1] Here are some bear-traps/issues to consider when using databases for the former purpose – namely to scrutinise databases for changes in treatment for a given condition:

  1. Codes. By a long way, the biggest problem you will encounter is the selection of codes. The HTA RCT on treatment of ankle fractures [2] described the type of fracture in completely different language to that used in the HES data. We did the best we could, seeking expert help from an orthopaedic surgeon specialising in the lower limb. Some thoughts:
    1. State the codes or code combinations used. In a recent paper, Costa and colleagues did not state all the codes used in the denominator for their statistics on uptake of treatment for fractures of the lower radius.[3] This makes it impossible to replicate their findings.
    2. Give the reader a comprehensive list of relevant codes highlighting those that you selected. This increases transparency and comparability, and can be included as an appendix.
    3. When uncertain, start with a narrow set of codes that seem to correspond most closely to indications for treatment in the research studies, but also provide results for a wider range – these may reflect ‘spill-over’ effects of study findings or miscoding. Again, the wider search can be included as an appendix, and serves as a kind of sensitivity analysis.
    4. If possible, examine coding practice by examining local databases that may contain detailed clinical information with the routine codes generated by that same institution. This provides empirical information on coding accuracy. We did this with respect to use of tight-fitting casts to treat unstable ankle fracture (found to be non-inferior to more invasive surgical plates [4]) and found that the procedure was coded in different ways. We combined these three codes in our study, although this increases measurement error (reducing the signal) on the assumption that these codes are not specific.
  2. Denominators.
    1. In some cases denominators cannot be ascertained. We encountered this problem in our analysis of surgery for oesophageal reflux, where surgery was found more effective than medical treatment.[5] The counterfactual here is medical therapy that can be delivered in various settings and that is not specific for the index condition. Here we simply had to examine the effects of the trial results on the number of operations carried out country-wide. Seasonal effects are a potential problem with denominator-free data.
    2. For surgical procedures, the procedure should be combined with the counterfactual procedure from the trial to create a denominator. The denominator can also be expanded to include other procedures for the same operation if this makes sense clinically.
  3. Data-interval. The more frequent the index procedure, then the shorter the appropriate interval. If the number of observations falls below a certain threshold, then the data cannot be reported to protect patient privacy, and a wider interval must be used. A six month interval seemed suitable for many surgical procedures.
  4. Of protocols and hypotheses. We have found that the detailed protocol must emerge as an iterative process including discussion with clinical experts. But we think there should be a ‘general’ prior hypothesis for this kind of work. So we specified the dates of publication of the HTA report as our pre-set time point – the equivalent of the primary hypothesis. We applied this date line for all of the procedures examined. However, solipsistic focus on this data line would obviously lead to an impoverished understanding, so we follow a three phase process inspired by Fichte’s thesis-antithesis-synthesis-thesis model [6]:
    1. We test the hypothesis that a linear model fits the data using a CUSUM (cumulative sum) test. The null hypothesis is that the cumulative sum of recursive residuals has an expected value of 0. If it wanders outside the 95% confidence band at any point in time, this indicates that the coefficients have changed and a single linear model does not fit the data.
    2. If the above test indicates a change in the coefficients, we use a Wald test to identify the point at which the model has a break. We estimate two separate models before and after the break data and the slopes/intercepts are compared.
    3. Last we ‘check by members’ and discuss with experts who can fill us in on when guidelines emerged and when other trials may have been published – ideally a literature review would complement this process.
  5. Interpretation. In the absence of contemporaneous controls, cause and effect inference must be cautious.

This is an initial iteration of our thoughts on this topic. However, increasing amounts of data are being captured in routine systems, and databases are increasingly constructed in real time since they are used primarily as a clinical tool. So we thought it would be helpful to start laying down some procedural rules for retrospective use of data to determine long-term trends. We invite readers to comment, enhance and extended this analysis.

— Richard Lilford, CLAHRC WM Director

— Katherine Reeves, Statistical Intelligence Analyst at UHBFT Health Informatics Centre


  1. ASTM International. Standard Practice for Use of Control Charts in Statistical Process Control. Active Standard ASTM E2587. West Conshohocken, PA: ASTM International; 2018.
  2. Keene DJ, Mistry D, Nam J, et al. The Ankle Injury Management (AIM) trial: a pragmatic, multicentre, equivalence randomised controlled trial and economic evaluation comparing close contact casting with open surgical reduction and internal fixation in the treatment of unstable ankle fractures in patients aged over 60 years. Health Technol Assess. 20(75): 1-158.
  3. Costa ML, Jameson SS, Reed MR. Do large pragmatic randomised trials change clinical practice? Assessing the impact of the Distal Radius Acute Fracture Fixation Trial (DRAFFT). Bone Joint J. 2016; 98-B: 410-3.
  4. Willett K, Keene DJ, Mistry D, et al. Close Contact Casting vs Surgery for Initial Treatment of Unstable Ankle Fractures in Older Adults. A Randomized Clinical Trial. JAMA. 2016; 316(14): 1455-63.
  5. Grant A, Wileman S, Ramsay C, et al. The effectiveness and cost-effectiveness of minimal access surgery amongst people with gastro-oesophageal reflux disease – a UK collaborative study. The REFLUX trial. Health Technol Assess. 2008; 12(31): 1–214.
  6. Fichte J. Early Philosophical Writings. Trans. and ed. Breazeale D. Ithaca, NY: Cornell University Press, 1988

3 thoughts on “Interim Guidelines for Studies of the Uptake of New Knowledge Based on Routinely Collected Data”

  1. The fundamental problem with clinical data analysis. As a mortality reviewer I am beset by the problems of poorly written clinal records and incorrect clinical coding. Why does it seem like a good idea to employ non clinicians to translate one complex language into another and then to look at subtle changes in performance and outcome. When I worked in the USA every hart had to be signed off by an attending ( consultant) before it could be billed. The boss was very keen that we learnt how to get right and our pay depended on it. In the state run health care system doctors are taught to code at medical school for similar reasons. So whose data do you feel confident in?

  2. Very reasonable comment. Thank you.
    It will be not full if you do not add the wordings like “real-world data”, “big data”. It is serious to filter out the buzz. I am afraid that some unexposed readers would not recognize that you are writing about the beloved “real-world data”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s