Towards Real-World Deepfake Detection: A Diverse In-the-wild Dataset of Forgery Faces

Authors: Junyu Shi, Minghui Li, Junguo Zuo, Zhifei Yu, Yipeng Lin, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Wei Wan, Yinzhe Xu, Leo Yu Zhang

Published: 2025-10-09 10:54:38+00:00

AI Summary

This paper introduces RedFace (Real-world-oriented Deepfake Face), a large-scale dataset comprising over 60,000 forged images and 1,000 manipulated videos of faces, designed to bridge the gap between academic benchmarks and real-world threats. RedFace utilizes nine commercial online platforms to generate diverse deepfakes, simulating black-box scenarios and integrating evolving in-the-wild deepfake technologies. Extensive evaluations confirm the limited generalization capabilities and practicality of existing deepfake detection methods against this challenging dataset.

Abstract

Deepfakes, leveraging advanced AIGC (Artificial Intelligence-Generated Content) techniques, create hyper-realistic synthetic images and videos of human faces, posing a significant threat to the authenticity of social media. While this real-world threat is increasingly prevalent, existing academic evaluations and benchmarks for detecting deepfake forgery often fall short to achieve effective application for their lack of specificity, limited deepfake diversity, restricted manipulation techniques.To address these limitations, we introduce RedFace (Real-world-oriented Deepfake Face), a specialized facial deepfake dataset, comprising over 60,000 forged images and 1,000 manipulated videos derived from authentic facial features, to bridge the gap between academic evaluations and real-world necessity. Unlike prior benchmarks, which typically rely on academic methods to generate deepfakes, RedFace utilizes 9 commercial online platforms to integrate the latest deepfake technologies found in the wild, effectively simulating real-world black-box scenarios.Moreover, RedFace's deepfakes are synthesized using bespoke algorithms, allowing it to capture diverse and evolving methods used by real-world deepfake creators. Extensive experimental results on RedFace (including cross-domain, intra-domain, and real-world social network dissemination simulations) verify the limited practicality of existing deepfake detection schemes against real-world applications. We further perform a detailed analysis of the RedFace dataset, elucidating the reason of its impact on detection performance compared to conventional datasets. Our dataset is available at: https://github.com/kikyou-220/RedFace.


Key findings
Existing deepfake detection methods exhibit limited practical application, with significant performance drops (AUC plummeting to around 50%) when trained on conventional datasets and tested on RedFace. Degradation simulations (e.g., severe JPEG compression) further reduced the effectiveness of most detectors, confirming that real-world media dissemination disrupts deepfake artifacts. DIRE generally showed the most stable performance among tested detectors.
Approach
The authors created the RedFace dataset using 9 commercial black-box online platforms to generate four categories of facial deepfakes: Entire Face Synthesis (EFS), Face Swapping (FS), Face Attribute Manipulation (FAM), and Face Reenactment (FR). They then conducted comprehensive cross-domain, intra-domain, and real-world degradation experiments using state-of-the-art deepfake detectors to benchmark real-world performance.
Datasets
RedFace (newly introduced), CelebA (source material), FF++, DFDC, GenImage, DiFF (for detector training and cross-domain evaluation).
Model(s)
Xception, CViT, F3Net, GramNet, DIRE, UFD.
Author countries
China, Australia