当前位置:
X-MOL 学术
›
JAMA Surg.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction
JAMA Surgery ( IF 15.7 ) Pub Date : 2024-10-09 , DOI: 10.1001/jamasurg.2024.4299 Jeremy A. Balch, Matthew M. Ruppert, Ziyuan Guan, Timothy R. Buchanan, Kenneth L. Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R. Upchurch, Christopher J. Tignanelli, Tyler J. Loftus
JAMA Surgery ( IF 15.7 ) Pub Date : 2024-10-09 , DOI: 10.1001/jamasurg.2024.4299 Jeremy A. Balch, Matthew M. Ruppert, Ziyuan Guan, Timothy R. Buchanan, Kenneth L. Abbott, Benjamin Shickel, Azra Bihorac, Muxuan Liang, Gilbert R. Upchurch, Christopher J. Tignanelli, Tyler J. Loftus
ImportanceMachine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications.ObjectiveTo evaluate risk-prediction model performance when trained on risk-specific cohorts.Design, Setting, and ParticipantsThis cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109 445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined.ExposuresThe model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-third and upper-third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively.Main Outcomes and MeasuresModel performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model.ResultsA total of 109 445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77 921 procedures [71.2%]) and Jacksonville (31 524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109 445 operations, 55 646 patients were male (50.8%), and 66 495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40).Conclusion and RelevanceIn this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.
更新日期:2024-10-09