AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

Authors: Qizhou Wang, Hanxun Huang, Guansong Pang, Sarah Erfani, Christopher Leckie

Published: 2025-09-04 16:03:44+00:00

AI Summary

This paper introduces AUDETER (AUdio DEepfake TEst Range), a large-scale and highly diverse dataset for deepfake audio detection, aimed at addressing the poor generalization of existing detection methods in real-world, open-world scenarios due to domain shifts. Comprising over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders (totaling 3 million clips), AUDETER is the largest deepfake audio dataset by scale. Experiments demonstrate that models trained on AUDETER achieve significantly improved generalized detection performance, reducing error rates by 44.1% to 51.6% on diverse cross-domain samples.

Abstract

Speech generation systems can produce remarkably realistic vocalisations that are often indistinguishable from human speech, posing significant authenticity challenges. Although numerous deepfake detection methods have been developed, their effectiveness in real-world environments remains unrealiable due to the domain shift between training and test samples arising from diverse human speech and fast evolving speech synthesis systems. This is not adequately addressed by current datasets, which lack real-world application challenges with diverse and up-to-date audios in both real and deep-fake categories. To fill this gap, we introduce AUDETER (AUdio DEepfake TEst Range), a large-scale, highly diverse deepfake audio dataset for comprehensive evaluation and robust development of generalised models for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns, totalling 3 million audio clips, making it the largest deepfake audio dataset by scale. Through extensive experiments with AUDETER, we reveal that i) state-of-the-art (SOTA) methods trained on existing datasets struggle to generalise to novel deepfake audio samples and suffer from high false positive rates on unseen human voice, underscoring the need for a comprehensive dataset; and ii) these methods trained on AUDETER achieve highly generalised detection performance and significantly reduce detection error rate by 44.1% to 51.6%, achieving an error rate of only 4.17% on diverse cross-domain samples in the popular In-the-Wild dataset, paving the way for training generalist deepfake audio detectors. AUDETER is available on GitHub.


Key findings
State-of-the-art deepfake audio detection methods trained on existing datasets struggle with open-world scenarios, exhibiting significant performance degradation and high false positive rates on novel deepfake audio and unseen human voices. Training these detection models on the AUDETER dataset dramatically improves their generalization capabilities, reducing the detection error rate by 44.1% to 51.6% and achieving an error rate of only 4.17% on diverse cross-domain samples from the In-the-Wild dataset. This underscores the critical importance of large-scale and diverse training data for developing robust deepfake audio detectors.
Approach
The authors address the challenge of poor generalization in deepfake audio detection by creating and introducing a new, large-scale, and diverse dataset called AUDETER. This dataset provides a comprehensive range of synthetic audio generated by 21 recent speech synthesis systems and corresponds to diverse human speech from 4 corpora. By training and evaluating existing detection models on this expanded dataset, they aim to improve the robustness and generalization capabilities of deepfake audio detectors.
Datasets
AUDETER (newly introduced), In-the-Wild, ASVSpoof (2019, 2021 DF), WaveFake, LibriSeVoc, Common Voice, People's Speech Dataset, Multilingual LibriSpeech (MLS).
Model(s)
RawNet2, RawGAT-ST, AASIST, PC-Dart, SAMO, Neural Vocoder Artifacts (NVA), Purdue M2, XLS-R + RawNet + Assist (XLR+R+A), XLS-R + SLS (XLS+SLS).
Author countries
Australia, Singapore