Data-Driven Deepfake Image Detection Method -- The 2024 Global Deepfake Image Detection Challenge

Authors: Xiaoya Zhu, Yibing Nan, Shiguo Lian

Published: 2025-08-15 13:24:47+00:00

AI Summary

This paper describes a deepfake image detection method that achieved excellence in the 2024 Global Deepfake Image Detection Challenge. The approach utilizes the Swin Transformer V2-B network and incorporates online data augmentation and offline sample generation to improve model generalization.

Abstract

With the rapid development of technology in the field of AI, deepfake technology has emerged as a double-edged sword. It has not only created a large amount of AI-generated content but also posed unprecedented challenges to digital security. The task of the competition is to determine whether a face image is a Deepfake image and output its probability score of being a Deepfake image. In the image track competition, our approach is based on the Swin Transformer V2-B classification network. And online data augmentation and offline sample generation methods are employed to enrich the diversity of training samples and increase the generalization ability of the model. Finally, we got the award of excellence in Deepfake image detection.


Key findings
The proposed method achieved a high score (above 0.96) in the competition, demonstrating effectiveness in deepfake image detection. The use of data augmentation significantly improved the model's generalization ability and robustness against various deepfake techniques. The model's inference speed was up to 39.215 FPS on an NVIDIA A100.
Approach
The authors used the Swin Transformer V2-B classification network for deepfake detection. They enhanced the model's robustness by employing online data augmentation and generating diverse offline training samples to address various deepfake techniques. Post-processing with a face keypoint detector and face detector was also applied.
Datasets
MultiFF dataset (524K images), along with augmented datasets created through techniques like random facial region cutout, local cropping, grayscaling, translation, overlaying, cartoonization, sketching, and binarization.
Model(s)
Swin Transformer V2-B
Author countries
China