Resolving multimodal ambiguity via knowledge-injection and ambiguity learning for multimodal sentiment analysis,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Resolving multimodal ambiguity via knowledge-injection and ambiguity learning for multimodal sentiment analysis
Information Fusion ( IF 14.7 ) Pub Date : 2024-10-31 , DOI: 10.1016/j.inffus.2024.102745
Xianbing Zhao, Xuejiao Li, Ronghuan Jiang, Buzhou Tang

Multimodal Sentiment Analysis (MSA) utilizes complementary multimodal features to predict sentiment polarity, which mainly involves language, vision, and audio modalities. Existing multimodal fusion methods primarily consider the complementarity of different modalities, while neglecting the ambiguity caused by conflicts between modalities (i.e. the text modality predicts positive sentiment while the visual modality predicts negative sentiment). To well diminish these conflicts, we develop a novel multimodal ambiguity learning framework, namely RMA, Resolving Multimodal Ambiguity via Knowledge-Injection and Ambiguity Learning for Multimodal Sentiment Analysis. Specifically, We introduce and filter external knowledge to enhance the consistency of cross-modal sentiment polarity prediction. Immediately, we explicitly measure ambiguity and dynamically adjust the impact between the subordinate modalities and the dominant modality to simultaneously consider the complementarity and conflicts of multiple modalities during multimodal fusion. Experiments demonstrate the dominantity of our proposed model across three public multimodal sentiment analysis datasets CMU-MOSI, CMU-MOSEI, and MELD.

中文翻译：

通过知识注入和多模态情感分析的歧义学习解决多模态歧义

多模态情感分析（MSA）利用互补的多模态特征来预测情感极性，主要涉及语言、视觉和音频模态。现有的多模态融合方法主要考虑不同模态的互补性，而忽略了模态之间冲突引起的歧义（即文本模态预测积极情绪，视觉模态预测负面情绪）。为了很好地减少这些冲突，我们开发了一种新的多模态歧义学习框架，即 RMA、Resolving Multi Modal Ambiguity via Knowledge-Injection 和 Ambiguity Learning for Multimodal Sentiment Analysis。具体来说，我们引入和过滤外部知识，以增强跨模态情感极性预测的一致性。我们立即明确测量模糊性并动态调整从属模态和主导模态之间的影响，以同时考虑多模态融合过程中多种模态的互补性和冲突。实验表明，我们提出的模型在三个公共多模态情感分析数据集 CMU-MOSI 、 CMU-MOSEI 和 MELD 中占据主导地位。

更新日期：2024-10-31

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南