Multimedia Systems ( IF 3.5 ) Pub Date : 2022-09-07 , DOI: 10.1007/s00530-022-00992-w Lu Zhao , Liming Yuan , Kun Hao , Xianbin Wen
Attention-based deep multi-instance learning (MIL) is an effective and interpretable model. Its interpretability is attributed to the learnability of its inner attention-based MIL pooling. Its main problem is to learn a unique instance-level target concept for weighting instances. Another implicative issue is to assume that the bag and instance concepts are located in the same semantic space. In this paper, we relax these constraints as: (i) There exist multiple instance concepts; (ii) The bag and instance concepts live in different semantic spaces. Upon the two relaxed constraints, we propose a two-level attention-based MIL pooling that first learns several instance concepts in a low-level semantic space and subsequently captures the bag concept in a high-level semantic space. To effectively capture different types of instance concepts, we also present a new similarity-based loss. The experimental results show that our method achieves higher or very comparable performance with state-of-the-art methods on benchmark data sets and surpasses them in terms of performance and interpretability on a synthetic data set.
中文翻译:
基于广义注意力的深度多实例学习
基于注意力的深度多实例学习 (MIL) 是一种有效且可解释的模型。它的可解释性归因于其内部基于注意力的 MIL 池的可学习性。它的主要问题是学习一个独特的实例级目标概念来加权实例。另一个隐含的问题是假设包和实例概念位于相同的语义空间中。在本文中,我们将这些约束放宽为: (i) 存在多个实例概念;(ii) 包和实例概念存在于不同的语义空间中。在这两个宽松的约束条件下,我们提出了一个基于注意力的两级 MIL 池,它首先在低级语义空间中学习几个实例概念,然后在高级语义空间中捕获袋子概念。为了有效地捕捉不同类型的实例概念,我们还提出了一种新的基于相似性的损失。实验结果表明,我们的方法在基准数据集上与最先进的方法取得了更高或非常可比的性能,并且在合成数据集上的性能和可解释性方面超过了它们。