当前位置:
X-MOL 学术
›
Int J Speech Technol
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Corpus based part-of-speech tagging
International Journal of Speech Technology Pub Date : 2016-08-01 , DOI: 10.1007/s10772-016-9356-2
Chengyao Lv , Huihua Liu , Yuanxing Dong , Yunliang Chen
International Journal of Speech Technology Pub Date : 2016-08-01 , DOI: 10.1007/s10772-016-9356-2
Chengyao Lv , Huihua Liu , Yuanxing Dong , Yunliang Chen
In natural language processing, a crucial subsystem in a wide range of applications is a part-of-speech (POS) tagger, which labels (or classifies) unannotated words of natural language with POS labels corresponding to categories such as noun, verb or adjective. Mainstream approaches are generally corpus-based: a POS tagger learns from a corpus of pre-annotated data how to correctly tag unlabeled data. Presented here is a brief state-of-the-art account on POS tagging. POS tagging approaches make use of labeled corpus to train computational trained models. Several typical models of three kings of tagging are introduced in this article: rule-based tagging, statistical approaches and evolution algorithms. The advantages and the pitfalls of each typical tagging are discussed and analyzed. Some rule-based and stochastic methods have been successfully achieved accuracies of 93–96 %, while that of some evolution algorithms are about 96–97 %.
中文翻译:
基于语料库的词性标注
在自然语言处理中,词性(POS)标记器是广泛应用中的关键子系统,该词性标记器使用对应于诸如名词,动词或形容词等类别的POS标签来标记(或分类)自然语言的未注释单词。 。主流方法通常是基于语料库的:POS标记器会从一组预先注释的数据中学习如何正确标记未标记的数据。这里介绍的是有关POS标签的最新技术概述。POS标记方法利用标记语料库来训练计算训练模型。本文介绍了三种标记王的典型模型:基于规则的标记,统计方法和演化算法。讨论并分析了每个典型标记的优点和陷阱。
更新日期:2016-08-01
中文翻译:

基于语料库的词性标注
在自然语言处理中,词性(POS)标记器是广泛应用中的关键子系统,该词性标记器使用对应于诸如名词,动词或形容词等类别的POS标签来标记(或分类)自然语言的未注释单词。 。主流方法通常是基于语料库的:POS标记器会从一组预先注释的数据中学习如何正确标记未标记的数据。这里介绍的是有关POS标签的最新技术概述。POS标记方法利用标记语料库来训练计算训练模型。本文介绍了三种标记王的典型模型:基于规则的标记,统计方法和演化算法。讨论并分析了每个典型标记的优点和陷阱。