在中性语音中从轻声说话中重建发音运动。,The Journal of the Acoustical Society of America

当前位置： X-MOL 学术 › J. Acoust. Soc. Am. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

在中性语音中从轻声说话中重建发音运动。
The Journal of the Acoustical Society of America ( IF 2.1 ) Pub Date : 2018-07-02 , DOI: 10.1121/1.5039750
Nisha Meenakshi G ₁ , Prasanta Kumar Ghosh ₁

Affiliation

研究了一种从耳语语音发音轨迹（WAT）重构中性语音发音轨迹（NAT）的转换函数（TF），以使转换后的耳语和原始中性发音运动之间的动态时间扭曲（DTW）距离最小化。考虑了三个候选TF：具有对角矩阵（Ad）的仿射函数，可从相应的WAT重构一个NAT；具有完整矩阵（Af）的仿射函数，以及基于深度神经网络（DNN）的非线性函数，可重构每个NAT从所有WAT。实验表明，通过Af可以很好地近似该变换，因为它在各个对象之间的泛化效果更好，并且实现了最小DTW距离为5.20（±1.27）mm（平均），分别提高了7.47％，4.76％和7。与Ad，DNN和最佳基准方案相比分别为64％（相对）。进一步分析以了解中性和耳语发音的差异后发现，耳语的发音器表现出夸张的动作，以便在中性语音中重建嘴唇的动作。还可以观察到，在研究中考虑的发音器中，舌头在窃窃私语时表现出更高的精确度和稳定性，这意味着受试者要小心地控制自己的舌头运动，以便发出可理解的窃窃私语的语音。

"点击查看英文标题和摘要"

Reconstruction of articulatory movements during neutral speech from those during whispered speech.

A transformation function (TF) that reconstructs neutral speech articulatory trajectories (NATs) from whispered speech articulatory trajectories (WATs) is investigated, such that the dynamic time warped (DTW) distance between the transformed whispered and the original neutral articulatory movements is minimized. Three candidate TFs are considered: an affine function with a diagonal matrix ( Ad) which reconstructs one NAT from the corresponding WAT, an affine function with a full matrix ( Af) and a deep neural network (DNN) based nonlinear function which reconstruct each NAT from all WATs. Experiments reveal that the transformation could be approximated well by Af, since it generalizes better across subjects and achieves the least DTW distance of 5.20 (±1.27) mm (on average), with an improvement of 7.47%, 4.76%, and 7.64% (relative) compared to that with Ad, DNN, and the best baseline scheme, respectively. Further analysis to understand the differences in neutral and whispered articulation reveals that the whispered articulators exhibit exaggerated movements in order to reconstruct the lip movements during neutral speech. It is also observed that among the articulators considered in the study, the tongue exhibits a higher precision and stability while whispering, implying that subjects control their tongue movements carefully in order to render an intelligible whispered speech.

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文