Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge
Authors: Mohammad Adiban, Hossein Sameti, Saeedreza Shehnepoor
Published: 2019-10-29 16:03:04+00:00
AI Summary
This paper proposes a novel replay spoofing countermeasure for Automatic Speaker Verification (ASV) systems to combat replay attacks. The approach utilizes Constant Q Cepstral Coefficient (CQCC) features, processes them through an autoencoder to capture informative and noise-aware representations, and employs a Siamese network for classification. Experiments on the ASVspoof 2019 dataset demonstrate significant improvements in Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) over baseline systems.
Abstract
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge from adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we used the Constant Q Cepstral Coefficient (CQCC) features fed into an autoencoder to attain more informative features and to consider the noise information of spoofed utterances for discrimination purpose. Finally, different configurations of the Siamese network were used for the first time in this context for classification. The experiments performed on ASVspoof challenge 2019 dataset using Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) as evaluation metrics show that the proposed system improved the results over the baseline by 10.73% and 0.2344 in terms of EER and t-DCF, respectively.