RADAR Challenge 2026: Robust Audio Deepfake Recognition under Media Transformations

Authors: Hieu-Thi Luong, Xuechen Liu, Ivan Kukanov, Zheng Xin Chai, Kong Aik Lee

Published: 2026-05-10 14:29:35+00:00

Comment: Submitted to APSIPA 2026

AI Summary

This paper presents the RADAR Challenge 2026, an APSIPA Grand Challenge focused on robust audio deepfake recognition amidst realistic media transformations. The challenge involves two phases: an English development phase and a multilingual evaluation phase with over 100,000 utterances, simulating conditions like compression, resampling, noise, and reverberation. It describes the challenge task, dataset construction, evaluation protocol, and overall results from 33 participating teams, highlighting the persistent challenges in robust deepfake detection under diverse conditions.

Abstract

RADAR Challenge 2026 is an APSIPA Grand Challenge on Robust Audio Deepfake Recognition under Media Transformations, designed to simulate realistic media conditions in real-world audio distribution pipelines, including compression, resampling, noise, and reverberation. It consists of two phases: an English development phase with labeled data for analysis and paper writing, and a multilingual evaluation phase containing more than 100,000 utterances in English, Singapore English, Mandarin Chinese, Taiwanese Mandarin, Japanese, and Vietnamese. Systems are evaluated using equal error rate (EER) for binary real/fake classification. This paper describes the challenge task, the construction of the data set, the evaluation protocol, and the overall results. During the challenge, 33 teams submitted to the development phase and 22 teams submitted to the final evaluation phase. The reported results highlight the remaining challenges of robust audio deepfake detection under multilingual and media-transformed conditions.


Key findings
The challenge results indicate that robust audio deepfake detection under multilingual and media-transformed conditions remains a significant challenge, despite promising performance from top-ranked systems. Performance differences between teams suggest variations in model design, training data, augmentation, and score calibration played a crucial role. Additionally, strong performance in the development phase did not consistently translate to robust results in the blind evaluation phase under unseen conditions.
Approach
The paper describes the RADAR Challenge 2026, which is designed to evaluate audio deepfake detection systems under realistic media-transformed conditions. It details the construction of a novel multilingual dataset, the application of diverse media transformation pipelines (e.g., compression, noise, reverberation) to both bona fide and spoofed speech, and the Equal Error Rate (EER) evaluation protocol for binary real/fake classification.
Datasets
Development Set: LlamaPartialSpoof (full-fake subset), LibriTTS (with media transformations using MIT RIR dataset, MUSAN noise, FMA small music). Evaluation Set: Newly developed benchmark dataset, featuring bona fide speech from Common Voice Scripted Speech, People’s Speech, IMDA, MAGICDATA Mandarin Read Speech, FormosaSpeech, CPJD, FOSD, and spoofed speech from ten TTS systems (iFlytek, Houshan, ElevenLabs, Cartesia, OpenAI, Chatterbox, CosyVoice 3.0, Qwen3-TTS, Fish Audio S2 Pro, Piper). Both are subjected to media transformations including Aachen RIRs, Simulated RIRs, Synthetic RIRs, FMA small music, and BSD10k sound effects.
Model(s)
The paper uses the SSL-AASIST model (combining a wav2vec 2.0 frontend with the AASIST backend) as a baseline system for the challenge.
Author countries
Singapore, China, Hong Kong SAR