Journal of Visualization ( IF 1.7 ) Pub Date : 2021-11-18 , DOI: 10.1007/s12650-021-00799-3 Pei-Shan Lo , Jian-Lin Wu , Syu-Ting Deng , Ko-Chih Wang
Abstract
Named entity recognition (NER) is a crucial initial task that identifies both spans and types of named entities to extract the specific information, such as organization, person, location, and time. Nowadays, the NER task achieves state-of-the-art performance by deep learning approaches for capturing contextual features. However, the complex structures of deep learning make a black-box problem and limit researchers’ ability to improve it. Unlike the Latin alphabet, Chinese (or other languages such as Korean and Japanese) do not have an explicit word boundary. Therefore, some preliminary works, such as word segmentation (WS) and part-of-speech tagging (POS), are needed before the Chinese NER task. The correctness of preliminary works importantly influences the final NER prediction. Thus, investigating the model behavior of the Chinese NER task becomes more complicated and challenging. In this paper, we present CNERVis, a visual analysis tool that allows users to interactively inspect the WS-POS-NER pipeline and understand how and why a NER prediction is made. Also, CNERVis allows users to load the numerous testing data and explores the critical instances to facilitate the analysis from large datasets. Our tool’s usability and effectiveness are demonstrated through case studies.
Graphic abstract
中文翻译:
CNERVis:中文命名实体识别的视觉诊断工具
摘要
命名实体识别 (NER) 是一项关键的初始任务,它识别命名实体的跨度和类型以提取特定信息,例如组织、人员、位置和时间。如今,NER 任务通过用于捕获上下文特征的深度学习方法实现了最先进的性能。然而,深度学习的复杂结构造成了一个黑盒问题,并限制了研究人员改进它的能力。与拉丁字母不同,中文(或其他语言,如韩语和日语)没有明确的词边界。因此,在中文 NER 任务之前需要一些初步工作,例如分词 (WS) 和词性标注 (POS)。前期工作的正确性对最终的NER预测有重要影响。因此,研究中文 NER 任务的模型行为变得更加复杂和具有挑战性。在本文中,我们介绍了 CNERVis,这是一种可视化分析工具,允许用户交互式检查 WS-POS-NER 管道并了解如何以及为何进行 NER 预测。此外,CNERVis 允许用户加载大量测试数据并探索关键实例,以方便从大型数据集进行分析。我们的工具的可用性和有效性通过案例研究得到证明。