Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities

Authors: Jingtong Dou, Chuancheng Shi, Jian Wang, Fei Shen, Zhiyong Wang, Tat-Seng Chua

Published: 2026-04-09 03:35:21+00:00

AI Summary

This paper introduces a paradigm shift from modality-specific feature fusion to modality generalization for multimodal deepfake detection. It proposes the Modality-Agnostic Forgery (MAF) framework, which decouples modality-specific styles to extract essential, cross-modal latent forgery knowledge. Evaluated on the novel DeepModal-Bench benchmark, MAF empirically demonstrates the existence of universal forgery traces and achieves significant performance breakthroughs on unknown modalities.

Abstract

As generative artificial intelligence evolves, deepfake attacks have escalated from single-modality manipulations to complex, multimodal threats. Existing forensic techniques face a severe generalization bottleneck: by relying excessively on superficial, modality-specific artifacts, they neglect the shared latent forgery knowledge hidden beneath variable physical appearances. Consequently, these models suffer catastrophic performance degradation when confronted with unseen dark modalities. To break this limitation, this paper introduces a paradigm shift that redefines multimodal forensics from conventional feature fusion to modality generalization. We propose the first modality-agnostic forgery (MAF) detection framework. By explicitly decoupling modality-specific styles, MAF precisely extracts the essential, cross-modal latent forgery knowledge. Furthermore, we define two progressive dimensions to quantify model generalization: transferability toward semantically correlated modalities (Weak MAF), and robustness against completely isolated signals of dark modality (Strong MAF). To rigorously assess these generalization limits, we introduce the DeepModal-Bench benchmark, which integrates diverse multimodal forgery detection algorithms and adapts state-of-the-art generalized learning methods. This study not only empirically proves the existence of universal forgery traces but also achieves significant performance breakthroughs on unknown modalities via the MAF framework, offering a pioneering technical pathway for universal multimodal defense.


Key findings
The study empirically validates the existence of universal forgery traces, proving that generative algorithms leave consistent statistical biases across disparate media, independent of surface-level artifacts. The MAF framework achieves significant performance improvements on unknown modalities compared to traditional methods, demonstrating robust source-free domain generalization capabilities against unseen and isolated 'dark modalities'. The intrinsic dimensionality of forgery features sharply decreases in the forensic space, confirming the successful isolation of compact, universal forgery knowledge.
Approach
The MAF framework tackles multimodal deepfake detection by explicitly decoupling modality-specific styles from raw inputs to extract shared, cross-modal latent forgery knowledge. This is achieved using modality-specific perceptors to generate features, which are then fed into a universal lightweight MLP detector. The framework is optimized using domain generalization strategies and evaluated under two progressive scenarios: Weak MAF for semantically correlated unseen modalities and Strong MAF for completely isolated 'dark modalities'.
Datasets
LAV-DF, FakeAVCeleb, ASVspoof5, Celeb-DF++
Model(s)
Lightweight four-layer Multi-Layer Perceptron (MLP) as the universal detector. For feature extraction (perceptors) that process audio: ImageBind, LanguageBind, UniBind (pre-trained, for Weak MAF), and self-supervised ViT encoders (for spectral signals/audio) with LoRA modules (for Strong MAF).
Author countries
Australia, China, Singapore