Replay Attacks Against Audio Deepfake Detection

View on arXiv ← Back to list

Authors: Nicolas Müller, Piotr Kawa, Wei-Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl

Published: 2025-05-20 19:46:36+00:00

AI Summary

This paper investigates the vulnerability of audio deepfake detection systems to replay attacks, where deepfake audio is played and re-recorded, making it harder to detect. A new dataset, ReplayDF, is introduced to study this, showing significant performance degradation in six open-source detection models when subjected to replay attacks.

Abstract

We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the top-performing W2V2-AASIST model's Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for non-commercial research use.

Key findings

Replay attacks significantly reduce the effectiveness of audio deepfake detection models, increasing the Equal Error Rate (EER) substantially. Even with adaptive retraining using room impulse responses (RIRs), performance remains compromised. The performance degradation is not solely due to added noise but rather the removal of key artifacts used by the models for detection.

Approach

The authors introduced ReplayDF, a dataset of deepfake audio recordings created by playing and re-recording deepfake audio through various speaker-microphone combinations. They evaluated the performance of six open-source audio deepfake detection models on this dataset and compared the results to the performance on the original, un-replayed audio.

Datasets

ReplayDF (created by the authors from M-AILABS and MLAAD), M-AILABS, MLAAD, ASVspoof 2019, ASVspoof 5, Fake-or-Real, In-the-Wild, ODSS

Model(s)

Whisper, Raw PC Darts, RawNet2, TCM ADD, RawGAT-ST, W2V2-AASIST

Author countries

Germany, Poland, USA, Romania

← Previous