重播攻击对自动扬声器验证(ASV)系统构成了巨大威胁。本文介绍了基于幅度调制和频率调制的功能,可用于重播欺骗性语音检测(SSD)任务。在这种情况下,我们提出了使用能量分离算法(ESA)的瞬时幅度(IA)和瞬时频率(IF)功能。语音信号通过带通(子带)滤波器,以获得窄带分量,因为语音是几个单分量信号的组合。为了获得窄带滤波信号,我们使用了线性间隔的Butterworth和Gabor滤波器组。瞬时调制有助于了解非平稳信号的局部特征。这些IA和IF分量能够捕获以缓慢变化的幅度包络和快速变化的频率呈现的信息。用于重播语音的慢速时间调制具有失真的幅度包络,而与自然语音信号相比,快速时间调制不保留谐波结构。对于重放语音信号,与自然语音能量相比,中间设备的特性和声学环境会使频谱能量失真。使用高斯混合模型(GMM)作为分类器,在ASVspoof 2017挑战2.0版数据库上进行了实验。当ESA-IACC和ESA-IFCC功能集与分数级别的恒定Q倒谱系数(CQCC)功能集融合时,%EER进一步降低至评估集上的11.93%和10.12%。此外,对于评估集,我们还研究了建议的功能集在不同的重放配置(RC)(即声学环境,重放和记录设备)上的性能。对于ASV系统的所有威胁状况级别(即低,中和高级别),与现有的最新功能集相比,建议的功能集表现更好。除了ASVspoof 2017 Challenge数据库外,我们还对其他欺骗数据库进行了实验,这些数据库是BTAS 2016,ASVspoof 2019 Challenge数据库和ASVspoof 2019 Challenge数据库的Real PA。对于本研究中使用的所有欺骗数据库,建议的基于ESA的功能集的性能明显优于其他功能集。我们还研究了建议的功能集在不同的重放配置(RC)(即声学环境,重放和记录设备)上的性能。对于ASV系统的所有威胁状况级别(即低,中和高级别),与现有的最新功能集相比,建议的功能集表现更好。除了ASVspoof 2017 Challenge数据库外,我们还对其他欺骗数据库进行了实验,这些数据库是BTAS 2016,ASVspoof 2019 Challenge数据库和ASVspoof 2019 Challenge数据库的Real PA。对于本研究中使用的所有欺骗数据库,建议的基于ESA的功能集的性能明显优于其他功能集。我们还研究了建议的功能集在不同的重放配置(RC)(即声学环境,重放和记录设备)上的性能。对于ASV系统的所有威胁状况级别(即低,中和高级别),与现有的最新功能集相比,建议的功能集表现更好。除了ASVspoof 2017 Challenge数据库外,我们还对其他欺骗数据库进行了实验,这些数据库是BTAS 2016,ASVspoof 2019 Challenge数据库和ASVspoof 2019 Challenge数据库的Real PA。对于本研究中使用的所有欺骗数据库,建议的基于ESA的功能集的性能明显优于其他功能集。中级和高级),与现有的最新功能集相比,建议的功能集表现更好。除了ASVspoof 2017 Challenge数据库外,我们还对其他欺骗数据库进行了实验,这些数据库是BTAS 2016,ASVspoof 2019 Challenge数据库和ASVspoof 2019 Challenge数据库的Real PA。对于本研究中使用的所有欺骗数据库,建议的基于ESA的功能集的性能明显优于其他功能集。中级和高级),与现有的最新功能集相比,建议的功能集表现更好。除了ASVspoof 2017 Challenge数据库外,我们还对其他欺骗数据库进行了实验,这些数据库是BTAS 2016,ASVspoof 2019 Challenge数据库和ASVspoof 2019 Challenge数据库的Real PA。对于本研究中使用的所有欺骗数据库,建议的基于ESA的功能集的性能明显优于其他功能集。
"点击查看英文标题和摘要"
Amplitude and Frequency Modulation-based features for detection of replay Spoof Speech
Replay attack poses a great threat to the Automatic Speaker Verification (ASV) system. This paper introduces Amplitude Modulation and Frequency Modulation-based features for replay Spoof Speech Detection (SSD) task. In this context, we propose Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) features using Energy Separation Algorithm (ESA). The speech signal is passed through bandpass (subband) filters to obtain narrowband components because speech is a combination of several monocomponent signals. To obtain a narrowband filtered signal, we have used linearly-spaced Butterworth and Gabor filterbanks. The instantaneous modulations helps to understand the local characteristics of a non-stationary signal. These IA and IF components are able to capture the information present in a slowly-varying amplitude envelope and fast-varying frequency. The slow-varying temporal modulations for replay speech have the distorted amplitude envelope, and the fast-varying temporal modulation do not preserve the harmonic structure compared to the natural speech signal. For replay speech signal, the intermediate device characteristics and acoustic environment distorts the spectral energy compared to the natural speech energy. Experiments were performed on the ASVspoof 2017 challenge version 2.0 database with Gaussian Mixture Model (GMM) as a classifier. When ESA-IACC and ESA-IFCC feature sets are fused with Constant Q Cepstral Coefficients (CQCC) feature set at the score-level, the % EER further reduces to 11.93% and 10.12%, respectively, on the evaluation set. In addition, for evaluation set, we have also studied the performance of proposed feature sets on different Replay Configurations (RC), namely, acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high-level) to the ASV system, the proposed feature sets performed better compared to the existing state-of-the-art feature sets. In addition to the ASVspoof 2017 Challenge database, we also performed experiments on other spoofing databases, namely, BTAS 2016, ASVspoof 2019 Challenge database, and Real PA of ASVspoof 2019 Challenge database. For all the spoofing databases used in this study, the proposed ESA-based feature sets perform significantly better than the other feature sets.