Causal Models Should Inform Statistical Models

CLAHRC WM News Blog readers will have read the study where 30 different statistical teams analysed data from the same association study (concerning the link between a soccer player’s skin colour and predisposition for the player to receive a red card).[1] At the end of the News Blog article Richard argued that the very different results of the analysis across the statistical teams would have been greatly attenuated had they first agreed on a causal model.[2] Even in the absence of agreement on a specific analysis protocol that operationalised this model, the shared conceptual model describing the underlying causal mechanisms leading to the association of skin colour to red  card totals could have led to an increased consensus concerning exactly which covariates to include in the model, whether to include them as mediating or confounding variables, and which (of the very large number of possible) first order interactions to examine.

In this article we provide a topical example to underscore the purpose of creating a conceptual causal model to inform building a statistical model. As our example, we use the putative correlation between the supportiveness of nurses’ workplace (henceforth called ‘workplace’) and clinical outcomes, such as pressure ulcers and patient satisfaction (henceforth called ‘outcome’).  A recent meta-analysis covering 21 studies and 22 countries found that a scale of workplace correlates with outcome, and that there was no statistical evidence of publication (small study) bias.[3]

But what should be controlled for in such association studies, and how can understanding of mechanisms (the essence of realist evaluations) be better understood?

We start with a simple model:

127 DCB - Causal Model Fig 1

Could the association be confounded? Say by hospital size? In that case it might immediately seem right to control for hospital:

127 DCB - Causal Model Fig 2

Not so fast we say! A confounder is a variable that is associated with both the explanatory variable (in this case environment) are the outcome variable, but which is not on the causal chain linking explanatory variable and outcome. Economists call a variable linking explanatory and outcome variables ‘endogenous’.

Don’t large hospitals have economies of scale? And don’t economies of scale allow them to have better nurse-patient ratios (hereafter called nurse/patient)? So a causal model relating hospital, environment and outcome might look like this:

127 DCB - Causal Model Fig 3

Now that we have constructed a causal model we are stimulated to theorise further. What about ‘leadership’, do I hear you say? Leadership may not be randomly distributed, the larger hospitals are likely to get first pick. So we might agree a model like this:

127 DCB - Causal Model Fig 4

It might also be necessary to control for possible confounders at the individual patient level that are not on the causal chain. Then our model may look like this:

127 DCB - Causal Model Fig 5

Presented like this our model suggests additional variables to measure and include, as well as a need to account for variables that are likely to be on the causal chain serving as mediating variables. Building causal conceptual models in this way can be formalized and extended using “directed acyclic graphs” which hold out the promise “that a researcher who has scientific knowledge in the form of a structural equation model is able to predict patterns of independencies in the data, based solely on the structure of the model’s graph, without relying on any quantitative information carried by the equations or by the distributions of the errors.”[4] If those independencies are validated by data that is collected it provides evidence for the models and the specified causal mechanisms. While there are challenges to building statistical models to analyse data using the more complex of these conceptual causal models, the resulting analyses are more likely to advance our theoretical understanding of the world.

— Richard Lilford, CLAHRC WM Director

— Timothy Hofer, Professor General Medicine

References:

  1. Silberzahn R, Uthman EL, Martin DP, et al. Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results. Adv Methods Pract Psychol Sci. 2018; 1(3): 337-56.
  2. Lilford RJ. The Same Data Set Analysed in Different Ways Yields Materially Different Parameter Estimates: The Most Important Paper I Have Read This Year. NIHR CLAHRC West Midlands News Blog. 16 November 2018.
  3. Lake ET, Sanders J, Duan R, Riman KA, Schoenauer KM, Chen Y. A Meta-Analysis of the Associations Between the Nurse Work Environment in Hospitals and 4 Sets of Outcomes. Med Care. 2019; 57(5): 353-61.
  4. Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics: A Primer. Chichester: Wiley; 2016. p. 35.

One thought on “Causal Models Should Inform Statistical Models”

  1. Thank you Richard and Timothy, this is great. I wonder if I can make a brief comment.

    If “workplace” is truly the exposure of interest, I wonder about the statement on the need “to account for variables that are likely to be on the causal chain serving as mediating variables”.

    If this structure is correct, then examining the causal relationship between workplace and outcome only requires patient confounders to be included in the model, with hospital size included for the potential effect modifcation. Leadership and nurse/patient are not relevant as they are not on the causal pathway.

    Your readers may find dagitty useful, with the code below for your model.

    http://www.dagitty.net
    Hospital_size 1 @0.181,0.451
    Leadership 1 @0.336,0.175
    Nurse-patient 1 @0.337,0.319
    Outcome O @0.619,0.316
    Patient_confounders 1 @0.556,0.450
    Workplace E @0.488,0.318

Leave a comment