Randomised controlled trials (RCTs) are getting larger. Increased sample sizes enable researchers to achieve greater statistical precision and detect increasingly smaller effect sizes against diminishing baseline rates of the primary outcomes of interest. Thus over time, we are seeing an increase in the sample sizes of RCTs, leading to what may be termed mega-trials (>1,000 participants) and even ultra-trials (>10,000 participants). The below figure shows the minimum detectable effect size (in terms of difference from baseline) for a trial versus its sample size (with α = 0.05, β = 0.8 and a control group baseline of 0.1, i.e. 10%), along with a small selection of non-cluster RCTs from the New England Journal of Medicine published in the last three years. What this figure illustrates is that there is diminishing returns, in terms of statistical power, from larger sample sizes.
Nevertheless, with great statistical power comes great responsibility. Assuming that the sample size is large enough that observation of a p-value greater than 0.05 is evidence that no (clinically significant) effect exists may lead to perhaps erroneous conclusions. For example, Fox et al. (2014) enrolled 19,102 participants to examine whether ivabradine improved clinical outcomes of patients with stable coronary artery disease. The estimated hazard ratio for death from cardiovascular causes or acute myocardial infarction with the treatment was 1.08 but with a p-value of 0.2, and so it was concluded that ivabradine did not improve outcomes. However, we might see this as evidence that ivabradine worsens outcomes. A crude calculation suggests that the minimum detectable hazard ratio in this study was 1.14, and, for a sample of this size, the results suggest that almost 50 more patients died (against a baseline of 6.3%) in the treatment group. One might therefore actually see this as clinically significant.
Similarly, Roe et al. (2012) enrolled 7,243 patients to compare prasugrel and clopidogrel for acute coronary syndromes without revascularisation. The hazard ratio for death with prasugrel was 0.91 with a p-value of 0.21. The authors concluded that prasugrel did not “significantly” reduce the risk of death. Yet, with the death rate in the clopidogrel group at 16%, a hazard ratio of 0.91, with a sample size this large, represents approximately 50 fewer deaths in the prasugrel group. Again, some may argue that this is clinically significant. Importantly, a quick calculation reveals that the minimum detectable effect size in this study was 0.89.
Many authors have warned against using p-values to decide on whether an intervention has an effect or not. Mega- and ultra-trials do not reduce the folly of using p-values in this way and may even exacerbate the problem by providing a false sense of confidence.
— Samuel Watson, Research Fellow
- Fox K, Ford I, Steg PG, Tardif J-C, Tendera M, Ferrari R. Ivabradine in Stable Coronary Artery Disease without Clinical Heart Failure. New Engl J Med. 2014; 371(12): 1091-99.
- Roe MT, Armstrong PW, Fox KAA, et al. Prasugrel versus Clopidogrel for Acute Coronary Syndromes without Revascularization. New Engl J Med. 2012; 367: 1297-1309.