Fair evaluation of federated learning algorithms for automated breast density classification: The results of the 2022 ACR-NCI-NVIDIA federated learning challenge
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-05-15 , DOI: 10.1016/j.media.2024.103206
Kendall Schmidt 1 , Benjamin Bearce 2 , Ken Chang 3 , Laura Coombs 1 , Keyvan Farahani 4 , Marawan Elbatel 5 , Kaouther Mouheb 5 , Robert Marti 5 , Ruipeng Zhang 6 , Yao Zhang 7 , Yanfeng Wang 6 , Yaojun Hu 8 , Haochao Ying 9 , Yuyang Xu 10 , Conrad Testagrose 11 , Mutlu Demirer 12 , Vikash Gupta 12 , Ünal Akünal 13 , Markus Bujotzek 13 , Klaus H Maier-Hein 13 , Yi Qin 14 , Xiaomeng Li 14 , Jayashree Kalpathy-Cramer 2 , Holger R Roth 15

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools’ Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.


自动乳腺密度分类联邦学习算法的公平评估:2022 年 ACR-NCI-NVIDIA 联邦学习挑战赛的结果

乳房密度的正确解释对于评估乳腺癌风险非常重要。人工智能已被证明能够准确预测乳腺密度,但是,由于乳房 X 光检查系统之间的成像特征存在差异,使用一个系统的数据构建的模型不能很好地推广到其他系统。尽管联邦学习 (FL) 已成为一种无需共享数据即可提高 AI 通用性的方法,但在 FL 期间保留所有训练数据特征的最佳方法是一个活跃的研究领域。为了探索 FL 方法,乳腺密度分类 FL 挑战赛与美国放射学院、哈佛医学院麻省总医院、科罗拉多大学、NVIDIA 和美国国立卫生研究院国家癌症研究所合作举办。挑战参与者能够提交能够在三个模拟医疗设施上实施 FL 的 Docker 容器,每个容器都包含一个独特的大型乳房 X 线摄影数据集。乳房密度 FL 挑战于 2022 年 6 月 15 日至 9 月 5 日举行,吸引了来自世界各地的七名决赛选手。获胜的 FL 提交在挑战测试数据上的线性 kappa 分数为 0.653,在外部测试数据集上的线性 kappa 分数为 0.413,得分与在中心位置使用相同数据训练的模型相当。