A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

View on arXiv ← Back to list

Authors: Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta, Sunil Aryal

Published: 2025-08-22 23:57:04+00:00

AI Summary

This research paper provides a comprehensive survey of modern threats against voice authentication systems (VAS) and anti-spoofing countermeasures (CMs). It categorizes and analyzes various attacks, including data poisoning, adversarial attacks, deepfakes, and adversarial spoofing attacks, highlighting methodologies, datasets, and limitations of existing literature.

Abstract

Voice authentication has undergone significant changes from traditional systems that relied on handcrafted acoustic features to deep learning models that can extract robust speaker embeddings. This advancement has expanded its applications across finance, smart devices, law enforcement, and beyond. However, as adoption has grown, so have the threats. This survey presents a comprehensive review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs), including data poisoning, adversarial, deepfake, and adversarial spoofing attacks. We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements. For each category of attack, we summarize methodologies, highlight commonly used datasets, compare performance and limitations, and organize existing literature using widely accepted taxonomies. By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.

Key findings

The survey reveals a significant vulnerability in voice authentication systems to a wide range of sophisticated attacks. Existing defenses often lack robustness and generalizability across different attack types and real-world conditions. A notable finding is the increasing effectiveness of deepfake attacks, which can generate highly realistic synthetic speech capable of bypassing authentication systems.

Approach

The paper conducts a systematic literature review, categorizing and analyzing existing research on various attacks against voice authentication systems and anti-spoofing countermeasures. It examines the methodologies, datasets used, and limitations of each attack type, and provides a structured overview of the current threat landscape.

Datasets

VoxCeleb1, VoxCeleb2, LibriSpeech, TIMIT, NTIMIT, Common Voice, AISHELL datasets, ASVspoof challenges (2015, 2017, 2019, 2021), Fake or Real (FoR), BackdoorVoice, Voice Conversion Challenge (SVCC) datasets.

Model(s)

Various deep learning models for speaker recognition and anti-spoofing are mentioned throughout the survey, including but not limited to: GMM-UBM, i-vectors, x-vectors, ECAPA-TDNN, DeepSpeaker, CNNs, ResNet, Kaldi, various GAN models, VITS, YourTTS, FreeVC, Wav2Vec, Tacotron 2.

Author countries

Australia

← Previous