将句子分类的质量和速度与现代语言模型进行比较,Applied Sciences

当前位置： X-MOL 学术 › Appl. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

将句子分类的质量和速度与现代语言模型进行比较
Applied Sciences ( IF 2.5 ) Pub Date : 2020-05-14 , DOI: 10.3390/app10103386
Krzysztof Fiok , Waldemar Karwowski , Edgar Gutierrez , Mohammad Reza-Davahli

在 Glove 和 Word2vec 出现之后，用于生成词嵌入的语言模型 (LM) 的动态发展使得创建更好的文本分类器框架成为可能。使用新的 LM 生成的词向量表示，嵌入不再是静态的，而是上下文感知的。然而，最先进的 LM 提供的结果质量是以速度为代价的。我们的目标是提出一个基准，以深入了解基于所选 LM 提供的词嵌入的句子分类器框架的速度-质量权衡。我们使用带有门控循环单元的循环神经网络从 LM 提供的词嵌入和用于分类的单个完全连接层创建句子级向量表示。对两个句子分类数据集进行了基准测试：第六届文本检索会议 (TREC6) 集和我们设计的 1000 句数据集。我们基于这两个数据源的蒙特卡罗交叉验证结果表明，最新的深度学习 LM 在加权马修斯相关系数 (MCC) 分数方面比 Glove 和 FastText 有所改进。我们假设当处理更困难的分类任务时，LMs 的进展更加明显。

"点击查看英文标题和摘要"

Comparing the Quality and Speed of Sentence Classification with Modern Language Models

After the advent of Glove and Word2vec, the dynamic development of language models (LMs) used to generate word embeddings has enabled the creation of better text classifier frameworks. With the vector representations of words generated by newer LMs, embeddings are no longer static but are context-aware. However, the quality of results provided by state-of-the-art LMs comes at the price of speed. Our goal was to present a benchmark to provide insight into the speed–quality trade-off of a sentence classifier framework based on word embeddings provided by selected LMs. We used a recurrent neural network with gated recurrent units to create sentence-level vector representations from word embeddings provided by an LM and a single fully connected layer for classification. Benchmarking was performed on two sentence classification data sets: The Sixth Text REtrieval Conference (TREC6)set and a 1000-sentence data set of our design. Our Monte Carlo cross-validated results based on these two data sources demonstrated that the newest deep learning LMs provided improvements over Glove and FastText in terms of weighted Matthews correlation coefficient (MCC) scores. We postulate that progress in LMs is more apparent when more difficult classification tasks are addressed.

更新日期：2020-05-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>