当前位置: X-MOL 学术IEEE Trans. Affect. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Task Momentum Distillation for Multimodal Sentiment Analysis
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2023-06-02 , DOI: 10.1109/taffc.2023.3282410
Ronghao Lin 1 , Haifeng Hu 1
Affiliation  

In the field of Multimodal Sentiment Analysis (MSA), the prevailing methods are devoted to developing intricate network architectures to capture the intra- and inter-modal dynamics, which necessitates numerous parameters and poses more difficulties in terms of interpretability in multimodal modeling. Besides, the heterogeneous nature of multiple modalities (text, audio, and vision) introduces significant modality gaps, thereby making multimodal representation learning an ongoing challenge. To address the aforementioned issues, by considering the learning process of modalities as multiple subtasks, we propose a novel approach named Multi-Task Momentum Distillation (MTMD) which succeeds in reducing the gap among different modalities. Specifically, according to the abundance of semantic information, we treat the subtasks of textual and multimodal representations as the teacher networks while the subtasks of acoustic and visual representations as the student ones to present knowledge distillation, which transfers the sentiment-related knowledge guided by the regression and classification subtasks. Additionally, we adopt unimodal momentum models to explore modality-specific knowledge deeply and employ adaptive momentum fusion factors to learn a robust multimodal representation. Furthermore, we provide a theoretical perspective of mutual information maximization by interpreting MTMD as generating sentiment-related views in various ways. Extensive experiments illustrate the superiority of our approach compared with the state-of-the-art methods in MSA.

中文翻译:


用于多模态情感分析的多任务动量蒸馏



在多模态情感分析(MSA)领域,主流方法致力于开发复杂的网络架构来捕获模态内和模间动态,这需要大量参数,并且在多模态建模的可解释性方面带来更多困难。此外,多种模态(文本、音频和视觉)的异构性质引入了显着的模态差距,从而使多模态表示学习成为持续的挑战。为了解决上述问题,通过将模态的学习过程视为多个子任务,我们提出了一种名为多任务动量蒸馏(MTMD)的新方法,它成功地缩小了不同模态之间的差距。具体来说,根据丰富的语义信息,我们将文本和多模态表示的子任务视为教师网络,将声学和视觉表示的子任务视为学生网络,以呈现知识蒸馏,从而将情感相关的知识转移到网络中。回归和分类子任务。此外,我们采用单峰动量模型来深入探索特定于模态的知识,并采用自适应动量融合因子来学习鲁棒的多模态表示。此外,我们通过将 MTMD 解释为以各种方式生成情感相关的视图,提供了互信息最大化的理论视角。大量的实验证明了我们的方法与 MSA 中最先进的方法相比的优越性。
更新日期:2023-06-02
down
wechat
bug