当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of Sound
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-05-23 , DOI: 10.1109/tifs.2024.3404885
Hanbo Cai 1 , Pengcheng Zhang 1 , Hai Dong 2 , Yan Xiao 3 , Stefanos Koffas 4 , Yiming Li 5
Affiliation  

Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ( $e.g$ ., pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to ‘mask’ it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audio to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more ‘natural’ poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ( $e.g$ ., all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. Our methods achieve attack success rates of over 95% in most cases and are nearly undetectable. The code for reproducing main experiments are available at https://github.com/HanboCai/BadSpeech_SoE .

中文翻译:


通过声音元素对语音识别进行隐形后门攻击



深度神经网络(DNN)已被广泛且成功地采用和部署在语音识别的各种应用中。最近,一些研究表明这些模型很容易受到后门攻击,攻击者可以通过毒害受害者模型的训练过程,将恶意预测行为植入到受害者模型中。在本文中,我们重新审视针对语音识别的纯毒后门攻击。我们发现现有的方法并不隐秘,因为它们的触发模式可以被人类或机器检测感知。这种限制主要是因为它们的触发模式是简单的噪音或可分离且独特的剪辑。受这些发现的启发,我们建议利用声音元素(例如,音调和音色)来设计更隐蔽但有效的纯毒后门攻击。具体来说,我们插入一个短时高音信号作为触发器,并增加剩余音频片段的音调以“掩盖”它,以设计基于音调的隐形触发器。我们操纵受害者音频的音色特征来设计基于音色的隐秘攻击,并设计声纹选择模块以促进多后门攻击。我们的攻击可以生成更多“自然”的有毒样本,因此更加隐蔽。在基准数据集上进行了大量的实验,验证了我们的攻击在不同设置($e.g$.、多对一、多对多、干净标签、物理和多后门设置)下的有效性及其隐秘性。我们的方法在大多数情况下可以实现超过 95% 的攻击成功率,并且几乎无法检测到。重现主要实验的代码可在 https://github.com/HanboCai/BadSpeech_SoE 获取。
更新日期:2024-05-23
down
wechat
bug