ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection
Authors: Benjamin Chou, Yi Zhu, Surya Koppisetti
Published: 2026-04-17 23:44:33+00:00
Comment: To appear at ACL Findings 2026
AI Summary
ICLAD introduces a novel In-Context Learning paradigm with comparison-guidance for audio deepfake detection, addressing the generalization gap of existing systems on in-the-wild deepfakes. It leverages Audio Language Models (ALMs) for training-free detection, providing textual rationales by employing a pairwise comparative reasoning strategy to filter irrelevant acoustic attributes. ICLAD, augmented by a specialized deepfake detector, demonstrates significant macro F1 improvements on in-the-wild datasets.
Abstract
Audio deepfakes pose a significant security threat, yet current state-of-the-art (SOTA) detection systems do not generalize well to realistic in-the-wild deepfakes. We introduce a novel \\textbf{I}n-\\textbf{C}ontext \\textbf{L}earning paradigm with comparison-guidance for \\textbf{A}udio \\textbf{D}eepfake detection (\\textbf{ICLAD}). The framework enables the use of audio language models (ALMs) for training-free generalization to unseen deepfakes and provides textual rationales on the detection outcome. At the core of ICLAD is a pairwise comparative reasoning strategy that guides the ALM to discover and filter hallucinations and deepfake-irrelevant acoustic attributes. The ALM works alongside a specialized deepfake detector, whereby a routing mechanism feeds out-of-distribution samples to the ALM. On in-the-wild datasets, ICLAD improves macro F1 over the specialized detector, with up to $2\\times$ relative improvement. Further analysis demonstrates the flexibility of ICLAD and its potential for deployment on recent open-source ALMs.