Generalizing Speaker Verification for Spoof Awareness in the Embedding Space

View on arXiv ← Back to list

Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

Published: 2024-01-20 07:30:22+00:00

AI Summary

This paper proposes a generalized standalone ASV (G-SASV) system for speaker verification that is robust to spoofing attacks. It achieves this by enhancing a simple deep neural network backend using limited spoofing data during training, without requiring a separate spoofing countermeasure module during testing. The approach improves the performance of statistical ASV backends significantly.

Abstract

It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module to classify speech input either as a bonafide, or a spoofed utterance. Nevertheless, such a design requires additional computation and utilization efforts at the authentication stage. An alternative strategy involves a single monolithic ASV system designed to handle both zero-effort imposter (non-targets) and spoofing attacks. Such spoof-aware ASV systems have the potential to provide stronger protections and more economic computations. To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase. We propose a novel yet simple backend classifier based on deep neural networks and conduct the study via domain adaptation and multi-task integration of spoof embeddings at the training stage. Experiments are conducted on the ASVspoof 2019 logical access dataset, where we improve the performance of statistical ASV backends on the joint (bonafide and spoofed) and spoofed conditions by a maximum of 36.2% and 49.8% in terms of equal error rates, respectively.

Key findings

The proposed G-SASV system significantly improves performance on the ASVspoof 2019 logical access dataset, achieving a maximum relative improvement of 36.2% and 49.8% in Equal Error Rates (EER) for joint (bonafide and spoofed) and spoofed conditions, respectively. The best results were obtained using cosine similarity as a regression loss function and a soft parameter sharing scheme with an auxiliary classification branch using meta-attributes.

Approach

The authors generalize a standalone ASV system by using a deep neural network backend classifier trained on limited spoofing data. They employ domain adaptation and multi-task learning to integrate spoof embeddings during training, enhancing the system's robustness without the need for a separate spoofing countermeasure module during the testing phase.

Datasets

ASVspoof 2019 logical access dataset, VoxCeleb1 dataset

Model(s)

ECAPA-TDNN (for feature extraction), 3-layer Multi-Layer Perceptron (MLP) (for backend classification)

Author countries

France, India, Hong Kong, Finland

← Previous