Transferable Adversarial Attacks on Audio Deepfake Detection

Authors: Muhammad Umar Farooq, Awais Khan, Kutub Uddin, Khalid Mahmood Malik

Published: 2025-01-21 05:46:47+00:00

Journal Ref: WACV 2025

AI Summary

This paper introduces a transferable GAN-based adversarial attack framework to evaluate the resilience of state-of-the-art (SOTA) audio deepfake detection (ADD) systems. The framework generates high-quality adversarial attacks by leveraging an ensemble of surrogate ADD models, a discriminator, and a self-supervised audio model to ensure transcription and perceptual integrity. Experimental results demonstrate significant vulnerabilities in SOTA ADD systems, with substantial accuracy drops across white-box, gray-box, and black-box attack scenarios on benchmark datasets.

Abstract

Audio deepfakes pose significant threats, including impersonation, fraud, and reputation damage. To address these risks, audio deepfake detection (ADD) techniques have been developed, demonstrating success on benchmarks like ASVspoof2019. However, their resilience against transferable adversarial attacks remains largely unexplored. In this paper, we introduce a transferable GAN-based adversarial attack framework to evaluate the effectiveness of state-of-the-art (SOTA) ADD systems. By leveraging an ensemble of surrogate ADD models and a discriminator, the proposed approach generates transferable adversarial attacks that better reflect real-world scenarios. Unlike previous methods, the proposed framework incorporates a self-supervised audio model to ensure transcription and perceptual integrity, resulting in high-quality adversarial attacks. Experimental results on benchmark dataset reveal that SOTA ADD systems exhibit significant vulnerabilities, with accuracies dropping from 98% to 26%, 92% to 54%, and 94% to 84% in white-box, gray-box, and black-box scenarios, respectively. When tested in other data sets, performance drops of 91% to 46%, and 94% to 67% were observed against the In-the-Wild and WaveFake data sets, respectively. These results highlight the significant vulnerabilities of existing ADD systems and emphasize the need to enhance their robustness against advanced adversarial threats to ensure security and reliability.


Key findings
State-of-the-art ADD systems are highly vulnerable to the proposed transferable adversarial attacks, with accuracies dropping significantly. For instance, on the ASVspoof2019 dataset, accuracies decreased from 98% to 26% (white-box), 92% to 54% (gray-box), and 94% to 84% (black-box). Similar performance degradation was observed on In-the-Wild and WaveFake datasets, highlighting the urgent need for more robust ADD systems.
Approach
The authors propose a transferable GAN-based adversarial attack framework comprising a generator, discriminator, an ensemble of surrogate ADD models, and a self-supervised transcription model. The generator creates adversarial audio samples by minimizing a combined loss function that includes perceptual loss, forensics loss (to fool ADD models), transcription loss (to preserve content), and adversarial loss (from the discriminator). This ensures the attacks are transferable, high-quality, and maintain transcription and perceptual integrity.
Datasets
ASVspoof2019 (logical access (LA) subset), WaveFake, In-the-Wild
Model(s)
Res-TSSDNet, Inc-TSSDNet, RawNet2, ResNet1D, MS-ResNet (for ADD systems); Wave2Vec, Transformer-based BERT encoder (for transcription model within the attack framework)
Author countries
USA