Deepfake Detection in Social Media: A Temporal Artifact Analysis Using 3D Convolutional Neural Networks
Authors: Mohammadreza Rashidi, Raja Hashim Ali, Sami Ur Rahman
Published: 2026-05-17 18:01:32+00:00
Comment: 13 pages, 6 figures
AI Summary
The paper addresses the degradation of frame-level deepfake detectors against high-quality synthetic facial videos by proposing a 3D Convolutional Neural Network (R3D-18) that leverages temporal inconsistencies. This approach, trained with a composite loss including a temporal-consistency regularizer, demonstrates superior intra-dataset accuracy and better cross-dataset generalization, confirming that temporal artifacts are a robust detection signal.
Abstract
Synthetic facial videos have proliferated across social media faster than platform moderation can respond, raising the cost of disinformation and identity-based attacks. Frame-level deepfake detectors degrade sharply as generator quality increases; high-quality 128x128 GAN output cuts spatial-only accuracy by five percentage points while leaving temporal inconsistencies largely intact. We address this gap with a 3D Convolutional Neural Network detector based on R3D-18, trained with a composite loss that combines binary cross-entropy with a temporal-consistency regularizer. The model processes 16-frame clips from the DeepfakeTIMIT dataset and is initialized from Kinetics-400 action-recognition weights. We report 92.8% accuracy on intra-dataset evaluation at 128x128 resolution; cross-dataset transfer to FaceForensics++ without fine-tuning reaches 76.4%, rising after minimal fine-tuning. Ablation studies show that transfer learning contributes 7.2 percentage points and face tracking adds 3.5 points, while temporal consistency regularization provides additional gains on high-quality fakes. The results establish that temporal artifacts generalize more broadly than spatial ones, providing a detection signal that survives social-media re-encoding.