Deepfake Synthesis vs. Detection: An Uneven Contest

Authors: Md. Tarek Hasan, Sanjay Saha, Shaojing Fan, Swakkhar Shatabda, Terence Sim

Published: 2026-02-08 14:26:14+00:00

AI Summary

This study conducts a comprehensive empirical analysis of state-of-the-art deepfake detection techniques and human evaluation against cutting-edge synthesis methods. It reveals that current detection models exhibit markedly poor performance against modern deepfakes, including diffusion-based and NeRF methods. The research highlights a critical and widening gap between deepfake generation capabilities and current detection methodologies, calling for urgent refinement.

Abstract

The rapid advancement of deepfake technology has significantly elevated the realism and accessibility of synthetic media. Emerging techniques, such as diffusion-based models and Neural Radiance Fields (NeRF), alongside enhancements in traditional Generative Adversarial Networks (GANs), have contributed to the sophisticated generation of deepfake videos. Concurrently, deepfake detection methods have seen notable progress, driven by innovations in Transformer architectures, contrastive learning, and other machine learning approaches. In this study, we conduct a comprehensive empirical analysis of state-of-the-art deepfake detection techniques, including human evaluation experiments against cutting-edge synthesis methods. Our findings highlight a concerning trend: many state-of-the-art detection models exhibit markedly poor performance when challenged with deepfakes produced by modern synthesis techniques, including poor performance by human participants against the best quality deepfakes. Through extensive experimentation, we provide evidence that underscores the urgent need for continued refinement of detection models to keep pace with the evolving capabilities of deepfake generation technologies. This research emphasizes the critical gap between current detection methodologies and the sophistication of new generation techniques, calling for intensified efforts in this crucial area of study.


Key findings
State-of-the-art deepfake detection models exhibit significantly poor performance when challenged with deepfakes produced by modern synthesis techniques, particularly diffusion-based and NeRF methods. Human evaluators generally outperform automated methods but also struggle with the highest quality deepfakes, though prior AI experience improves human detection accuracy. This demonstrates a critical and growing gap between the advancement of deepfake synthesis and current detection capabilities.
Approach
The authors perform a comprehensive empirical evaluation of various state-of-the-art deepfake detection models, alongside human evaluation experiments, against deepfakes generated by cutting-edge synthesis techniques. They analyze performance using metrics like AUC, AP, Precision, Recall, and signal detection theory (d-prime, C) across different video durations and resolutions to assess the robustness and generalization capabilities of detectors.
Datasets
VoxCeleb (for generating test deepfakes), FaceForensics++, Celeb-DF, WildDeepfake, DFDC, ImageNet, Diverse Fake Face Dataset.
Model(s)
MesoNet, Xception, Capsule, EfficientNet-B4, FFD, SRM, RECCE, CORE, UCF, Meso4Inception.
Author countries
Bangladesh, Singapore