ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
Authors: Mohammad Romani
Published: 2025-11-18 14:56:34+00:00
AI Summary
ForensicFlow is a tri-modal adaptive network designed for robust video Deepfake detection, integrating evidence from three complementary domains: RGB, texture, and frequency. The architecture utilizes state-of-the-art backbones and attention-based mechanisms for temporal pooling and dynamic feature fusion. It achieved high performance on the Celeb-DF (v2) dataset, demonstrating superior resilience against subtle forgeries compared to single-stream baselines.
Abstract
Deepfakes generated by advanced GANs and autoencoders severely threaten information integrity and societal stability. Single-stream CNNs fail to capture multi-scale forgery artifacts across spatial, texture, and frequency domains, limiting robustness and generalization. We introduce the ForensicFlow, a tri-modal forensic framework that synergistically fuses RGB, texture, and frequency evidence for video Deepfake detection. The RGB branch (ConvNeXt-tiny) extracts global visual inconsistencies; the texture branch (Swin Transformer-tiny) detects fine-grained blending artifacts; the frequency branch (CNN + SE) identifies periodic spectral noise. Attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive attention fusion balances branch contributions.Trained on Celeb-DF (v2) with Focal Loss, ForensicFlow achieves AUC 0.9752, F1-Score 0.9408, and accuracy 0.9208, outperforming single-stream baselines. Ablation validates branch synergy; Grad-CAM confirms forensic focus. This comprehensive feature fusion provides superior resilience against subtle forgeries.