Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

View on arXiv ← Back to list

Authors: Hashim Ali, Surya Subramani, Lekha Bollinani, Nithin Sai Adupa, Sali El-Loh, Hafiz Malik

Published: 2025-08-28 16:37:50+00:00

AI Summary

This research paper explores multilingual dataset integration strategies for robust audio deepfake detection. By systematically experimenting with self-supervised learning front-ends and various dataset combinations, the authors achieved second place in two tasks of the SAFE Challenge, demonstrating strong generalization and robustness.

Abstract

The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.

Key findings

The study demonstrates that diverse multilingual datasets significantly improve audio deepfake detection performance. The WavLM Large model consistently outperformed MAE-AST Frame on Tasks 1 and 2 of the SAFE challenge. Longer audio segments improved detection of processed audio, while laundered audio remained a significant challenge.

Approach

The authors used a two-stage architecture combining self-supervised learning (SSL) front-ends (WavLM Large and MAE-AST Frame) with an AASIST back-end. They systematically integrated multiple multilingual datasets, exploring different training data compositions and audio lengths to improve robustness and generalization.

Datasets

CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, MAILABS, ASVspoof 2019 LA, In-The-Wild (ITW)

Model(s)

WavLM Large, MAE-AST Frame, AASIST

Author countries

USA

← Previous