npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-09-07 , DOI: 10.1038/s41746-024-01226-1 Klavdiia Naumova 1 , Arnout Devos 2 , Sai Praneeth Karimireddy 3, 4 , Martin Jaggi 5 , Mary-Anne Hartley 1, 6
Distributed collaborative learning is a promising approach for building predictive models for privacy-sensitive biomedical images. Here, several data owners (clients) train a joint model without sharing their original data. However, concealed systematic biases can compromise model performance and fairness. This study presents MyThisYourThat (MyTH) approach, which adapts an interpretable prototypical part learning network to a distributed setting, enabling each client to visualize feature differences learned by others on their own image: comparing one client’s 'This’ with others’ 'That’. Our setting demonstrates four clients collaboratively training two diagnostic classifiers on a benchmark X-ray dataset. Without data bias, the global model reaches 74.14% balanced accuracy for cardiomegaly and 74.08% for pleural effusion. We show that with systematic visual bias in one client, the performance of global models drops to near-random. We demonstrate how differences between local and global prototypes reveal biases and allow their visualization on each client’s data without compromising privacy.
中文翻译:
MyThisYourThat 用于生物医学图像联合学习中系统偏差的可解释识别
分布式协作学习是为隐私敏感的生物医学图像构建预测模型的一种有前景的方法。在这里,多个数据所有者(客户)在不共享原始数据的情况下训练联合模型。然而,隐藏的系统偏差可能会损害模型的性能和公平性。这项研究提出了 MyThisYourThat (MyTH) 方法,该方法将可解释的原型零件学习网络适应分布式环境,使每个客户能够在自己的图像上可视化其他人学到的特征差异:将一个客户的“这个”与其他客户的“那个”进行比较。我们的设置演示了四个客户在基准 X 射线数据集上协作训练两个诊断分类器。在没有数据偏差的情况下,全局模型对心脏肥大的平衡准确率达到 74.14%,对胸腔积液的平衡准确率达到 74.08%。我们表明,由于一个客户存在系统性视觉偏差,全局模型的性能下降到近乎随机。我们展示了本地和全球原型之间的差异如何揭示偏见,并允许它们在不损害隐私的情况下对每个客户的数据进行可视化。