Advancing Voice Biometrics for Dysarthria Speakers Using Multitaper LFCC and Voice Conversion Data Augmentation,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Advancing Voice Biometrics for Dysarthria Speakers Using Multitaper LFCC and Voice Conversion Data Augmentation
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-10-23 , DOI: 10.1109/tifs.2024.3484661
Shinimol Salim, Waquar Ahmad

Patients with dysarthria and physical impairments face challenges with traditional user interfaces. An Automatic Speaker Verification (ASV) system can enhance accessibility by replacing complex authentication methods and enabling voice biometrics in various applications for patients with dysarthria. This study focuses on enhancing accessibility of patients with dysarthria through an ASV system. In this study, a noval low variance Multitaper Linear Frequency Cepstral Coefficients (MTLFCC) feature is proposed. An ASV system for patients with dysarthria is implemented using the voice conversion data augmentation within a DNN framework. An extensive analysis is conducted to compare various multitaper techniques and taper weight choices using the Thomson multitaper method, specifically verifying patients with dysarthria as speakers. The impact of voice conversion through a cycle-consistent generative adversarial network (Cycle GAN) is also examined by modifying the acoustic attributes of control speech to make it perceptually similar to dysarthria speech and its implications for dysarthria ASV. Furthermore, the system performance is analyzed for different severity level of dysarthria to gain insight into how the selected multitaper parameters influence the outcomes. This study pioneers the use of MTLFCC features for ASV in the context of dysarthria, offering a novel approach to improve accessibility for this group.

中文翻译：

使用多锥度 LFCC 和语音转换数据增强推进构音障碍说话者的语音生物识别技术

构音障碍和身体障碍患者面临着传统用户界面的挑战。自动说话人验证（ASV）系统可以通过取代复杂的身份验证方法并在构音障碍患者的各种应用中启用语音生物识别技术来增强可访问性。本研究的重点是通过 ASV 系统提高构音障碍患者的可及性。在本研究中，提出了一种新型低方差多锥度线性频率倒谱系数（MTLFCC）特征。构音障碍患者的 ASV 系统是在 DNN 框架内使用语音转换数据增强实现的。进行了广泛的分析，以使用 Thomson multitaper 方法比较各种多锥度技术和锥度重量选择，特别是验证构音障碍患者作为说话者。还通过修改控制语音的声学属性使其在感知上类似于构音障碍语音及其对构音障碍 ASV 的影响，研究了通过周期一致的生成对抗网络（Cycle GAN）进行语音转换的影响。此外，还针对构音障碍的不同严重程度分析了系统性能，以深入了解所选的 multitaper 参数如何影响结果。本研究开创了在构音障碍的情况下将 MTLFCC 特征用于 ASV，提供了一种提高该群体可及性的新方法。

更新日期：2024-10-23

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南