Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake Detection

Authors: Qihao Shen, Jiaxing Xuan, Zhenguang Liu, Sifan Wu, Yutong Xie, Zhaoyan Ming, Yingying Jiao, kui Ren

Published: 2026-04-19 15:08:05+00:00

AI Summary

This paper introduces a novel frequency-aware triple-branch network for robust deepfake detection, addressing issues of overfitting to specific frequency domains and redundant feature representations. The method jointly captures spatial and frequency features by learning from original images and those reconstructed by different frequency channels. It employs mutual information theory to derive feature decoupling and fusion losses, enhancing the model's focus on task-relevant features and improving generalization across diverse forgery patterns.

Abstract

Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.


Key findings
The method consistently achieves state-of-the-art performance across six large-scale benchmark datasets, demonstrating superior in-dataset (e.g., 0.990 AUC on FF++ and 0.999 AUC on CDF2) and cross-dataset generalization. Ablation studies confirm the crucial role of the dynamic frequency channel selection and the mutual information-driven losses in enhancing performance and robustness. The model also shows improved resilience to image degradation and noise, with visualizations revealing specialized and complementary attention patterns across its branches for effective artifact localization.
Approach
The proposed method utilizes a triple-branch network consisting of an RGB branch and two frequency branches (primary and secondary, from top-K and next-K frequency channels respectively). It integrates a dynamic frequency channel selection module to adaptively identify informative frequency cues and a cross-frequency channel enhancement module to merge frequency features. Feature decoupling and global information alignment losses, derived from mutual information theory, are employed to ensure distinct and complementary feature learning while reducing redundancy.
Datasets
DFDC-P, DFDC, FaceForensics++ (FF++), Celeb-DF-V1 (CDF1), Celeb-DF-V2 (CDF2), DF40
Model(s)
Xception
Author countries
China