npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-11-18 , DOI: 10.1038/s41746-024-01330-2 Sanjay Basu, Dean Schillinger, Sadiq Y. Patel, Joseph Rigdon
Population health initiatives often rely on cold outreach to close gaps in preventive care, such as overdue screenings or immunizations. Tailoring messages to diverse patient populations remains challenging, as traditional A/B testing requires large sample sizes to test only two alternative messages. With increasing availability of large language models (LLMs), programs can utilize tiered testing among both LLM and manual human agents, presenting the dilemma of identifying which patients need different levels of human support to cost-effectively engage large populations. Using microsimulations, we compared both the statistical power and false positive rates of A/B testing and Sequential Multiple Assignment Randomized Trials (SMART) for developing personalized communications across multiple effect sizes and sample sizes. SMART showed better cost-effectiveness and net benefit across all scenarios, but superior power for detecting heterogeneous treatment effects (HTEs) only in later randomization stages, when populations were more homogeneous and subtle differences drove engagement differences.
中文翻译:
模拟 A/B 测试与 SMART 设计,以实现 LLM 驱动的患者参与,以缩小预防保健差距
人口健康计划通常依靠冷外展来缩小预防保健方面的差距,例如逾期筛查或免疫接种。为不同的患者群体定制消息仍然具有挑战性,因为传统的 A/B 测试需要大样本量来仅测试两条替代消息。随着大型语言模型 (LLMs,程序可以在 LLM,从而面临识别哪些患者需要不同级别的人工支持以经济高效地吸引大量人群的困境。使用微观模拟,我们比较了 A/B 测试和序贯多分配随机试验 (SMART) 的统计功效和假阳性率,以开发跨多种效应量和样本量的个性化通信。SMART 在所有情况下都显示出更好的成本效益和净收益,但仅在随机化后期阶段检测异质性治疗效应 (HTE) 的能力更强,此时种群更加同质,细微的差异会驱动参与差异。