npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-10-09 , DOI: 10.1038/s41746-024-01278-3 Dong Hyun Choi, Min Hyuk Lim, Ki Jeong Hong, Young Gyun Kim, Jeong Ho Park, Kyoung Jun Song, Sang Do Shin, Sungwan Kim
On-scene resuscitation time is associated with out-of-hospital cardiac arrest (OHCA) outcomes. We developed and validated reinforcement learning models for individualized on-scene resuscitation times, leveraging nationwide Korean data. Adult OHCA patients with a medical cause of arrest were included (N = 73,905). The optimal policy was derived from conservative Q-learning to maximize survival. The on-scene return of spontaneous circulation hazard rates estimated from the Random Survival Forest were used as intermediate rewards to handle sparse rewards, while patients’ historical survival was reflected in the terminal rewards. The optimal policy increased the survival to hospital discharge rate from 9.6% to 12.5% (95% CI: 12.2–12.8) and the good neurological recovery rate from 5.4% to 7.5% (95% CI: 7.3–7.7). The recommended maximum on-scene resuscitation times for patients demonstrated a bimodal distribution, varying with patient, emergency medical services, and OHCA characteristics. Our survival analysis-based approach generates explainable rewards, reducing subjectivity in reinforcement learning.
中文翻译:
使用强化学习对院外心脏骤停的现场复苏时间做出个性化决策
现场复苏时间与院外心脏骤停 (OHCA) 结果相关。我们利用韩国全国数据,为个性化的现场复苏时间开发并验证了强化学习模型。纳入因医学原因导致骤停的成年 OHCA 患者 (N = 73,905)。最佳策略源自保守的 Q 学习,以最大限度地提高生存率。从随机生存森林估计的自发循环危险率的现场返回被用作处理稀疏奖励的中间奖励,而患者的历史生存率则反映在最终奖励中。最佳政策将出院存活率从 9.6% 提高到 12.5% (95% CI: 12.2-12.8),将良好的神经功能恢复率从 5.4% 提高到 7.5% (95% CI: 7.3-7.7)。建议的患者现场复苏时间呈双峰分布,随患者、紧急医疗服务和 OHCA 特征而变化。我们基于生存分析的方法会产生可解释的奖励,减少强化学习中的主观性。