Environmental Sound Deepfake Detection Challenge: An Overview
Authors: Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang
Published: 2025-12-30 11:03:36+00:00
AI Summary
This paper introduces EnvSDD, the first large-scale curated dataset designed for Environmental Sound Deepfake Detection (ESDD), addressing the limitations of prior small-scale resources. It provides an overview of the ICASSP 2026 ESDD Challenge, which utilized EnvSDD across two tracks focusing on detection robustness against unseen and black-box audio generators. The paper analyzes the strategies and results of the top-performing systems.
Abstract
Recent progress in audio generation models has made it possible to create highly realistic and immersive soundscapes, which are now widely used in film and virtual-reality-related applications. However, these audio generators also raise concerns about potential misuse, such as producing deceptive audio for fabricated videos or spreading misleading information. Therefore, it is essential to develop effective methods for detecting fake environmental sounds. Existing datasets for environmental sound deepfake detection (ESDD) remain limited in both scale and the diversity of sound categories they cover. To address this gap, we introduced EnvSDD, the first large-scale curated dataset designed for ESDD. Based on EnvSDD, we launched the ESDD Challenge, recognized as one of the ICASSP 2026 Grand Challenges. This paper presents an overview of the ESDD Challenge, including a detailed analysis of the challenge results.