当前位置: X-MOL 学术npj Digit. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simulating A/B testing versus SMART designs for LLM-driven patient engagement to close preventive care gaps
npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-11-18 , DOI: 10.1038/s41746-024-01330-2
Sanjay Basu, Dean Schillinger, Sadiq Y. Patel, Joseph Rigdon

Population health initiatives often rely on cold outreach to close gaps in preventive care, such as overdue screenings or immunizations. Tailoring messages to diverse patient populations remains challenging, as traditional A/B testing requires large sample sizes to test only two alternative messages. With increasing availability of large language models (LLMs), programs can utilize tiered testing among both LLM and manual human agents, presenting the dilemma of identifying which patients need different levels of human support to cost-effectively engage large populations. Using microsimulations, we compared both the statistical power and false positive rates of A/B testing and Sequential Multiple Assignment Randomized Trials (SMART) for developing personalized communications across multiple effect sizes and sample sizes. SMART showed better cost-effectiveness and net benefit across all scenarios, but superior power for detecting heterogeneous treatment effects (HTEs) only in later randomization stages, when populations were more homogeneous and subtle differences drove engagement differences.



中文翻译:


模拟 A/B 测试与 SMART 设计,以实现 LLM 驱动的患者参与,以缩小预防保健差距



人口健康计划通常依靠冷外展来缩小预防保健方面的差距,例如逾期筛查或免疫接种。为不同的患者群体定制消息仍然具有挑战性,因为传统的 A/B 测试需要大样本量来仅测试两条替代消息。随着大型语言模型 (LLMs,程序可以在 LLM,从而面临识别哪些患者需要不同级别的人工支持以经济高效地吸引大量人群的困境。使用微观模拟,我们比较了 A/B 测试和序贯多分配随机试验 (SMART) 的统计功效和假阳性率,以开发跨多种效应量和样本量的个性化通信。SMART 在所有情况下都显示出更好的成本效益和净收益,但仅在随机化后期阶段检测异质性治疗效应 (HTE) 的能力更强,此时种群更加同质,细微的差异会驱动参与差异。

更新日期:2024-11-18
down
wechat
bug