Active in-context learning for cross-domain entity resolution,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Active in-context learning for cross-domain entity resolution
Information Fusion ( IF 14.7 ) Pub Date : 2024-12-05 , DOI: 10.1016/j.inffus.2024.102816
Ziheng Zhang, Weixin Zeng, Jiuyang Tang, Hongbin Huang, Xiang Zhao

Entity resolution (ER) is the task of determining the equivalence between two entity descriptions. In traditional settings, the testing data and training data come from the same domain, e.g., sharing the same attribute structure. Nevertheless, in practical situations, the testing and training data often span different domains, hence calling for the study of the cross-domain ER problem. To tackle the domain shift in cross-domain ER, state-of-the-art solutions devise neural models to utilize the information from the entity pairs in the target domain to guide the feature modeling in the source domain and also the model training. Nevertheless, these approaches require excessive computational resources and fine-tuning efforts to achieve effective matching. To mitigate these issues, in this work, we for the first time investigate the in-context learning (ICL) capabilities of large language models (LLMs) for cross-domain ER and introduce a new framework, CiDER. CiDER consists of three main modules, i.e., active candidate source data generation, in-context demonstration selection, and prompt generation, which can select optimal demonstrations from the source data to enhance LLM inference performance on ER in the target domain. Comprehensive experiments on multiple benchmarks demonstrate that CiDER offers significant improvements over existing methods on cross-domain ER.

中文翻译：

用于跨域实体解析的主动上下文学习

实体解析（ER）是确定两个实体描述之间的等效性的任务。在传统设置中，测试数据和训练数据来自同一个域，例如，共享相同的属性结构。然而，在实际情况下，测试和训练数据往往跨越不同的领域，因此需要研究跨领域 ER 问题。为了解决跨域 ER 中的域转移问题，最先进的解决方案设计了神经模型，以利用来自目标域中实体对的信息来指导源域中的特征建模和模型训练。然而，这些方法需要大量的计算资源和微调工作才能实现有效的匹配。为了缓解这些问题，在这项工作中，我们首次研究了用于跨域 ER 的大型语言模型（LLMs，并引入了一个新框架 CiDER。CiDER 由主动候选源数据生成、上下文演示选择和提示生成三个主要模块组成，可以从源数据中选择最优演示，以增强目标域 ER 上的 LLM 推理性能。对多个基准的综合实验表明，CiDER 在跨域 ER 方面比现有方法有了显著改进。

更新日期：2024-12-05

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南