当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dialectic Feature-Based Fuzzy Graph Learning for Label Propagation Assisting Text Classification
IEEE Transactions on Fuzzy Systems ( IF 10.7 ) Pub Date : 2024-07-02 , DOI: 10.1109/tfuzz.2024.3421595
Cherukula Madhu 1 , Sudhakar M S 2
Affiliation  

The abundant deposits of unstructured and scarcely labeled data over social networks make text classification (TC) vital for structuring and extracting useful information. In addition, ignoring dialectal variations significantly hinders the performance of international English (especially American and British) TC across numerous data domains. To address this multifaceted challenge, a comprehensive and adaptable framework termed dialectic feature-based fuzzy graph learning (DFFGL) is introduced that learns feature vectors by inculcating semantics and dialect variations from the inputted text. DFFGL then proficiently extracts uniquely modified terms frequency-inverse document frequency, parts-of-speech-tagged $\mathcal {N}-\text{grams}$ , with dialect-specific dictionary features in the fuzzy feature space to realize a novel language model. Later, these fuzzified features are affined by a novel fuzzy distance measure to construct an interpretable fuzzy graph that is then optimized using a novel elastic net regularizer for characterizing nodal relations, promising efficient classification through effective label propagation. Exhaustive $F1-$ score evaluations on 6 English corpora and 17 diverse datasets reveal DFFGL's superiority in consistently registering over 93% and 80% in dialect identification and TC even with just 10 labeled samples. Furthermore, DFFGL offers remarkable $F1-$ score improvements of 10.2% and 17.3% over its peers in respective tasks, highlighting its extension to real-world data classification.

中文翻译:


基于辩证特征的模糊图学习用于标签传播辅助文本分类



社交网络上大量的非结构化且几乎没有标记的数据使得文本分类(TC)对于构建和提取有用信息至关重要。此外,忽略方言变化会严重影响国际英语(尤其是美国和英国)TC 在众多数据域中的性能。为了解决这一多方面的挑战,引入了一种称为基于辩证特征的模糊图学习(DFFGL)的全面且适应性强的框架,该框架通过灌输输入文本的语义和方言变体来学习特征向量。然后,DFFGL 熟练地提取独特修改的术语频率-逆文档频率、词性标记 $\mathcal {N}-\text{grams}$ 以及模糊特征空间中特定方言的字典特征,以实现一种新颖的语言模型。随后,这些模糊化特征通过新颖的模糊距离度量进行关联,以构建可解释的模糊图,然后使用新颖的弹性网络正则化器来优化该模糊图来表征节点关系,从而通过有效的标签传播实现有效的分类。对 6 个英语语料库和 17 个不同数据集进行的详尽的 $F1-$ 分数评估表明,即使只有 10 个标记样本,DFFGL 在方言识别和 TC 方面的一致注册率也超过 93% 和 80%。此外,DFFGL 在相应任务中的 $F1-$ 分数比同行提高了 10.2% 和 17.3%,凸显了其对现实世界数据分类的扩展。
更新日期:2024-07-02
down
wechat
bug