Speech synthesis from neural decoding of spoken sentences,Nature

当前位置： X-MOL 学术 › Nature › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech synthesis from neural decoding of spoken sentences
Nature ( IF 50.5 ) Pub Date : 2019-04-01 , DOI: 10.1038/s41586-019-1119-1
Gopala K Anumanchipalli _{1,

2} , Josh Chartier _{1,

2,

3} , Edward F Chang _{1,

2,

3}

Affiliation

Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication.A neural decoder uses kinematic and sound representations encoded in human cortical activity to synthesize audible sentences, which are readily identified and transcribed by listeners.

中文翻译：

从口语句子的神经解码进行语音合成

将神经活动转化为语音的技术对于因神经系统损伤而无法交流的人来说将是变革性的。从神经活动中解码语音具有挑战性，因为说话需要对声道发音器进行非常精确和快速的多维控制。在这里，我们设计了一个神经解码器，它明确地利用人类皮层活动中编码的运动学和声音表征来合成可听语音。循环神经网络首先将直接记录的皮层活动解码为发音运动的表征，然后将这些表征转化为语音声学。在封闭式词汇测试中，听众可以很容易地识别和转录从皮质活动中合成的语音。即使数据有限，中间发音动力学也能提高性能。解码后的发音表示在说话者之间高度保守，使解码器的一个组成部分可以在参与者之间转移。此外，当参与者默默地模仿句子时，解码器可以合成语音。这些发现提高了使用语音神经假体技术恢复口语交流的临床可行性。神经解码器使用人类皮层活动中编码的运动学和声音表征来合成听得见的句子，这些句子很容易被听众识别和转录。当参与者默默地模仿句子时，解码器可以合成语音。这些发现提高了使用语音神经假体技术恢复口语交流的临床可行性。神经解码器使用人类皮层活动中编码的运动学和声音表征来合成听得见的句子，这些句子很容易被听众识别和转录。当参与者默默地模仿句子时，解码器可以合成语音。这些发现提高了使用语音神经假体技术恢复口语交流的临床可行性。神经解码器使用人类皮层活动中编码的运动学和声音表征来合成听得见的句子，这些句子很容易被听众识别和转录。

更新日期：2019-04-01

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南