当前位置: X-MOL 学术Trends Hear. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of a Phrase-Based Speech-Recognition Test Using Synthetic Speech
Trends in Hearing ( IF 2.6 ) Pub Date : 2024-07-25 , DOI: 10.1177/23312165241261490
Saskia Ibelings 1, 2, 3 , Thomas Brand 2, 3 , Esther Ruigendijk 3, 4 , Inga Holube 1, 3
Affiliation  

Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was −9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function's slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.

中文翻译:


使用合成语音开发基于短语的语音识别测试



语音识别测试广泛应用于临床和研究听力学。本研究的目的是开发一种新颖的语音识别测试,该测试结合了不同语音识别测试的概念,以减少训练效果并允许使用大量语音材料。新的测试由四个不同的单词组成,每个单词都是具有固定结构的有意义的结构,即所谓的短语。使用各种免费数据库来选择单词并确定它们的频率。高频名词被分为主题类别,并与相关形容词和不定式组合。剔除不恰当和不自然的组合,并消除重复的(子)短语后,总共剩下 772 个短语。随后,使用文本转语音系统合成这些短语。与使用真实扬声器录制相比,合成显着减少了工作量。排除异常值后,对 31 名听力正常的参与者在固定信噪比 (SNR) 下测量的短语的语音识别分数显示,每个短语的语音识别阈值 (SRT) 变化最多 4 dB。中位 SRT 为 -9.1 dB SNR,因此与现有的句子测试相当。心理测量函数的每 dB 15 个百分点的斜率也具有可比性,并且可以在听力学中有效使用。总而言之,在模块化系统中创建语音材料的原理具有许多潜在的应用。
更新日期:2024-07-25
down
wechat
bug