当前位置:
X-MOL 学术
›
IEEE Trans. Inform. Forensics Secur.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Adversarial Perturbation Prediction for Real-Time Protection of Speech Privacy
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-09-23 , DOI: 10.1109/tifs.2024.3463538 Zhaoyang Zhang, Shen Wang, Guopu Zhu, Dechen Zhan, Jiwu Huang
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-09-23 , DOI: 10.1109/tifs.2024.3463538 Zhaoyang Zhang, Shen Wang, Guopu Zhu, Dechen Zhan, Jiwu Huang
The widespread collection and analysis of private speech signals have become increasingly prevalent, raising significant privacy concerns. To protect speech signals from unauthorized analysis, adversarial attack methods for deceiving speaker recognition models have been proposed. While a few of these methods are specifically designed for real-time protection of speech signals, they introduce significant delays that can severely impact speech communication when applied to streaming speech data. In this paper, we present a novel approach that aims to offer real-time protection for speech signals without delays. By utilizing observed data only, we generate initial adversarial seed perturbations and refine them to obtain the necessary adversarial perturbations predicted for adjacent unobserved signals. This refinement process is conducted via a proposed model called PAPG. On the basis of perturbation prediction, we develop a streaming audio processing framework that generates perturbations in synchronization with the playback of the original signal, effectively eliminating delays. The experimental results demonstrate that under the proposed attack, the average Top-1 accuracy of various advanced speaker recognition methods is reduced by 89%, and the average equal error rate (EER) increases to 36%. Remarkably, these results are achieved without delays while maintaining superior perceptual quality.
中文翻译:
用于语音隐私实时保护的对抗性扰动预测
对私人语音信号的广泛收集和分析变得越来越普遍,引发了严重的隐私问题。为了保护语音信号免受未经授权的分析,已经提出了用于欺骗说话人识别模型的对抗性攻击方法。虽然其中一些方法是专门为语音信号的实时保护而设计的,但它们会引入显着的延迟,当应用于流式语音数据时,可能会严重影响语音通信。在本文中,我们提出了一种新颖的方法,旨在为语音信号提供无延迟的实时保护。通过仅利用观测到的数据,我们生成初始对抗性种子扰动并对其进行细化以获得为相邻未观测到的信号预测的必要对抗性扰动。这个细化过程是通过一个名为 PAPG 的提议模型进行的。在扰动预测的基础上,我们开发了一个流音频处理框架,该框架可以与原始信号的播放同步生成扰动,有效消除延迟。实验结果表明,在所提出的攻击下,各种先进说话人识别方法的平均Top-1准确率降低了89%,平均等错误率(EER)增加到36%。值得注意的是,这些结果是毫不延迟地实现的,同时保持了卓越的感知质量。
更新日期:2024-09-23
中文翻译:
用于语音隐私实时保护的对抗性扰动预测
对私人语音信号的广泛收集和分析变得越来越普遍,引发了严重的隐私问题。为了保护语音信号免受未经授权的分析,已经提出了用于欺骗说话人识别模型的对抗性攻击方法。虽然其中一些方法是专门为语音信号的实时保护而设计的,但它们会引入显着的延迟,当应用于流式语音数据时,可能会严重影响语音通信。在本文中,我们提出了一种新颖的方法,旨在为语音信号提供无延迟的实时保护。通过仅利用观测到的数据,我们生成初始对抗性种子扰动并对其进行细化以获得为相邻未观测到的信号预测的必要对抗性扰动。这个细化过程是通过一个名为 PAPG 的提议模型进行的。在扰动预测的基础上,我们开发了一个流音频处理框架,该框架可以与原始信号的播放同步生成扰动,有效消除延迟。实验结果表明,在所提出的攻击下,各种先进说话人识别方法的平均Top-1准确率降低了89%,平均等错误率(EER)增加到36%。值得注意的是,这些结果是毫不延迟地实现的,同时保持了卓越的感知质量。