当前位置: X-MOL 学术Soc. Sci. Comput. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Transformer Model for Manifesto Classification Using Cross-Context Training: An Ecuadorian Case Study
Social Science Computer Review ( IF 3.0 ) Pub Date : 2024-07-24 , DOI: 10.1177/08944393241266220
Fernanda Barzallo 1 , Maria Baldeon-Calisto 1, 2 , Margorie Pérez 1 , Maria Emilia Moscoso 1 , Danny Navarrete 1 , Daniel Riofrío 2 , Pablo Medina-Peréz 3 , Susana K Lai-Yuen 4 , Diego Benítez 2 , Noel Peréz 2 , Ricardo Flores Moyano 2 , Mateo Fierro 3
Affiliation  

Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.

中文翻译:


使用跨上下文训练进行宣言分类的 Transformer 模型:厄瓜多尔案例研究



政治宣言的内容分析对于了解政党的政策和拟议行动是必要的。然而,手动标记政治文本既耗时又费力。变压器网络已成为自动化此任务的重要工具。然而,这些模型需要大量的数据集才能实现良好的性能。这可能是宣言分类的一个限制,其中公开标记的数据集的可用性可能很少。为了应对这一挑战,在这项工作中,我们开发了一个 Transformer 网络,使用跨域训练策略对宣言进行分类。使用比较宣言项目的数据库,我们实施了部分因子实验设计,以确定哪些西班牙语编写的宣言构成了厄瓜多尔宣言标签的最佳训练集。此外,我们还统计分析了哪些 Transformer 架构和预处理操作可以提高模型的准确性。结果表明,使用西班牙和乌拉圭的宣言创建训练集,并实施词干提取和词形还原预处理操作,可以产生最高的分类精度。此外,我们发现 DistilBERT 和 RoBERTa Transformer 网络在统计上表现相似,并且在宣言分类中表现始终良好。使用跨上下文训练策略,DistilBERT 和 RoBERTa 在厄瓜多尔宣言的分类中分别达到 60.05% 和 57.64% 的准确率。最后,我们研究了训练集的组成对性能的影响。实验表明,仅使用厄瓜多尔宣言训练 DistilBERT 可以实现最高的准确度和 F1 分数。 此外,在缺乏厄瓜多尔数据集的情况下,通过使用西班牙和乌拉圭的数据集训练模型来实现有竞争力的性能。
更新日期:2024-07-24
down
wechat
bug