A multimodal data generation method for imbalanced classification with dual-discriminator constrained diffusion model and adaptive sample selection strategy,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A multimodal data generation method for imbalanced classification with dual-discriminator constrained diffusion model and adaptive sample selection strategy
Information Fusion ( IF 14.7 ) Pub Date : 2024-12-05 , DOI: 10.1016/j.inffus.2024.102843
Qiangwei Li, Xin Gao, Heping Lu, Baofeng Li, Feng Zhai, Taizhi Wang, Zhihang Meng, Yu Hao

Data-level methods often suffer from mode collapse when the minority class has multiple distribution patterns. Some studies have tried addressing the problem using similarity measurement or local dynamic information adjustment strategies but struggle with performance when the minority class exhibits intra-class imbalance. This paper proposes a multimodal data generation method with dual-discriminator constrained diffusion model and adaptive sample selection strategy, which mainly includes two critical modules. In the sample generation module based on the diffusion model, a dual-discriminator architecture is designed. The sample density discriminator controls generated samples to be located in the low-density areas by comparing neighbors distance. The sample authenticity discriminator guarantees the authenticity of generated samples by competing with the denoising network. Under the collaborative constraint of dual-discriminator, the generated samples can effectively alleviate the intra-class imbalance of the minority class. In the adaptive sample selection module, the density and probability weights of each original minority sample are calculated by analyzing the changes in neighbors density and classification probability before and after sample generation. The number of retained generated samples is adaptively calculated based on the weight difference. Then, the nearest neighbor principle is applied to select generated samples that can enhance classification and alleviate intra-class imbalance. In 40 imbalanced datasets, the proposed method improves the average rank by 39.31% compared with the second-ranked method, especially in 8 datasets where the minority class exhibits intra-class imbalance, the performance improvement reaches 61.53%, and the improvement effect is more significant.

中文翻译：

一种基于双判别器约束扩散模型和自适应样本选择策略的不平衡分类多模态数据生成方法

当少数类具有多个分布模式时，数据级方法经常会出现模式崩溃。一些研究尝试使用相似性测量或局部动态信息调整策略来解决这个问题，但当少数群体表现出类内不平衡时，表现却很困难。该文提出了一种具有双判别器约束扩散模型和自适应样本选择策略的多模态数据生成方法，主要包括两个关键模块。在基于扩散模型的样本生成模块中，设计了一种双判别器架构。样本密度鉴别器通过比较相邻距离来控制生成的样本位于低密度区域。样本真实性鉴别器通过与去噪网络竞争来保证生成样本的真实性。在双重判别器的协同约束下，生成的样本可以有效缓解少数类的类内不平衡。在自适应样本选择模块中，通过分析样本生成前后相邻要素密度和分类概率的变化，计算出每个原始少数样本的密度和概率权重。保留的生成样品数是根据重量差异自适应计算的。然后，应用最近邻原则来选择生成的样本，这些样本可以增强分类并缓解类内不平衡。在 40 个不平衡数据集中，所提方法的平均排名比排名第二的方法提高了 39.31%，尤其是在少数类表现出类内不平衡的 8 个数据集中，性能提升达到 61。53%，改善效果更显著。

更新日期：2024-12-05

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南