当前位置: X-MOL 学术Biling. Lang. Cognit. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Moving to continuous classifications of bilingualism through machine learning trained on language production
Bilingualism: Language and Cognition ( IF 2.5 ) Pub Date : 2024-05-24 , DOI: 10.1017/s1366728924000361
M. I. Coco , G. Smith , R. Spelorzi , M. Garraffa

Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. Support vector classifiers were trained on two datasets of coded productions by Italian speakers to predict the class they belonged to (“monolingual”, “attriters” and “heritage”). All classes can be predicted above chance (>33%), even if the classifier's performance substantially varies, with monolinguals identified much better (f-score >70%) than attriters (f-score <50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they are correctly classified. Cluster clitics are the most identifying features for the classification performance. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a priori established classes.

中文翻译:


通过语言生成训练的机器学习转向双语的连续分类



最近双语的概念正在从严格的分类转向连续的方法。这项研究通过将经验心理语言学数据与机器学习分类模型相结合来支持这一趋势。支持向量分类器在意大利语使用者的两个编码作品数据集上进行训练,以预测它们所属的类别(“单语”、“属性”和“遗产”)。即使分类器的性能差异很大,所有类别的预测都可以高于机会 (>33%),单语者的识别效果 (f-score >70%) 比流失者 (f-score <50%) 好得多,而流失者是最重要的令人困惑的类。对混淆矩阵中表达的分类错误的进一步分析证明,流失者被识别为传统说话者的频率几乎与他们被正确分类的频率一样。聚类簇是分类性能最具识别性的特征。总体而言,这项研究支持将双语概念化为语言行为的连续体,而不是先验建立的类别的集合。
更新日期:2024-05-24
down
wechat
bug