The graduate student made an eloquent research presentation that included detailed statistical analyses. The wizened professor asked one question: “So what?”
INTRODUCTION
Historically, the statistical approaches we use to assess the results from clinical studies have been based upon the concept of null hypothesis significance testing.1 The frequentist tests—t tests, chi-square tests, analysis of variance, linear and logistic regression analyses—are used with the assumption that the null hypothesis is true (no difference observed between the parameters of interest). This assumption then allows selection of the alternate hypothesis once the calculation of the frequentist test statistic falls below a preassigned cut point, usually P <0.05.2 However, these tests can be misleading in suggesting a clinical effect, as frequentist tests do not provide 2 important pieces of information: the magnitude of the effect of the intervention and the precision of that effect.1,3
Clinicians want to apply the best information obtained from clinical studies. However, when medical researchers only use frequentist tests to investigate their results, statistically significant results may or may not have clinical importance.3 Rather than using frequentist analyses, researchers should examine the degree of clinical difference with measures of effect size—risk or proportion differences in bivariate analyses and adjusted or standardized risk differences in multivariable analyses—and then determine if those differences are clinically important.3-5 We demonstrate the test differences on the same dataset used in Chai et al6 with the use of additional data collected but not reported in the main article.
EXAMPLE
Concerns have long been expressed that epidural analgesia may delay the process of maternal labor, leading to an increase in the incidence of instrumental delivery including the need for cesarean.7-9 Over the years, a low-dose epidural analgesia technique10-13 has been developed and was used for patients included in the study by Chai et al.6 We examined the association of the duration of epidural labor analgesia (in hours) to the incidence of instrumental delivery, first using frequentist testing and then using risk differences to analyze the same data.4,5
Chi-square analysis with the classical statistical significance cut point of P <0.052 showed that the duration of epidural labor analgesia was statistically associated with the incidence of instrumental delivery (chi-square=6.5, P=0.0110). The blue incidence line in the Figure increases
during the time period of interest and leads us to suspect a clinical association exists because the P value is less than the traditional cut point of 0.05.2
Using the same data set, we calculated the risk difference of the magnitude and precision of the duration of epidural labor analgesia with the incidence of instrumental delivery and obtained a mean risk value of 0.3% (CI 0.04%-0.6%) increase per hour of labor. As an example, the mean duration of labor was 8.5 hours, and the incidence of instrumental delivery increased from a baseline of 6.5% after the initial 2 hours of labor to an incidence of instrumental delivery of 8.7% after 10.5 hours of labor. This 2.2% change allows clinicians to interpret the importance of this association.
Additionally, we can investigate the interactions of additional clinical variables of interest by readjusting or standardizing their risk differences.14 We added delivery body mass index (BMI) to the analysis as it was the chief independent predictor of interest in Chai et al.6 The readjusted risk differences are shown in the Table. The addition of BMI to the model now increased the incidence of instrumental delivery to 0.4% (CI 0.1%-0.7%) per hour of labor, a minimal additive effect. The determination of whether this calculated effect size is clinically relevant depends upon the experience and professional practice of the clinician, as it should be.4,5
CONCLUSION
Although P values obtained from frequentist tests may suggest a clinical effect, the value does not reveal the magnitude or the precision of that effect. The use of measures of effect size can quantify this clinical influence. Properly conducted research studies will improve our delivery of health care when they answer the clinically important question: “So what?”
ACKNOWLEDGMENTS
The author has no financial or proprietary interest in the subject matter of this article.
- ©2024 by the author(s); Creative Commons Attribution License (CC BY)
©2024 by the author(s); licensee Ochsner Journal, Ochsner Clinic Foundation, New Orleans, LA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (creativecommons.org/licenses/by/4.0/legalcode) that permits unrestricted use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.