Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-11-26 , DOI: 10.1007/s40747-024-01661-3 Bin Chen, Hongyi Li, Di Zhao, Yitang Yang, Chengwei Pan
In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.
中文翻译:
基于嵌入模型自适应连接的网络威胁情报知识图谱质量评估
在网络威胁情报知识图谱的研究中,当前面临的挑战是存在错误、不一致或缺失的知识图谱三元组,这使得难以应对复杂性和多样化的应用需求。目前,知识图谱质量评估研究的主要方法涉及使用单词嵌入。该方法评估三元组的合理性以评估知识图谱的质量。最近的研究发现,通过拼接不同类型的嵌入可以获得更好的单词表示,并应用于命名实体识别 (NER) 等任务。然而,随着嵌入类型的激增,选择最佳嵌入来构建连接表示的难题已成为一个紧迫的问题。在本文中,我们提出了一种自适应嵌入连接 (AJE) 模型,以自动找到更好的词嵌入表示进行知识图谱质量评估。AJE 模型通过任务模型和选择器之间的协调交互来运行。前者对各种模型生成的单词嵌入向量进行采样,而后者根据从当前任务结果获得的反馈生成奖励,以决定是否拼接嵌入向量。在两个通用数据集和一个网络安全数据集上进行了实验,用于知识图谱质量评估。结果表明,我们的模型优于基线模型,在准确率和 F1 值等关键指标上取得了显著优势,在通用数据集 WN11、FB13 和网络安全数据集 CS13K 上分别获得了 95.8%、95.6% 和 91.3% 的准确率,比 AttTucker 模型提高了 1.0%、0.2% 和 0.5%。