Face Forgery Detection with Elaborate Backbone

Authors: Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan

Published: 2024-09-25 13:57:16+00:00

AI Summary

This paper addresses the poor generalization of Face Forgery Detection (FFD) models to unseen forgeries by proposing a revitalized FFD pipeline. It introduces an elaborate backbone pre-trained with self-supervised learning on real-face datasets, a competitive fine-tuning framework, and an uncertainty-based threshold optimization mechanism. The approach aims to equip FFD models with superior facial representation capabilities, enhance extraction of diverse forgery cues, and improve inference reliability for better generalization.

Abstract

Face Forgery Detection (FFD), or Deepfake detection, aims to determine whether a digital face is real or fake. Due to different face synthesis algorithms with diverse forgery patterns, FFD models often overfit specific patterns in training datasets, resulting in poor generalization to other unseen forgeries. This severe challenge requires FFD models to possess strong capabilities in representing complex facial features and extracting subtle forgery cues. Although previous FFD models directly employ existing backbones to represent and extract facial forgery cues, the critical role of backbones is often overlooked, particularly as their knowledge and capabilities are insufficient to address FFD challenges, inevitably limiting generalization. Therefore, it is essential to integrate the backbone pre-training configurations and seek practical solutions by revisiting the complete FFD workflow, from backbone pre-training and fine-tuning to inference of discriminant results. Specifically, we analyze the crucial contributions of backbones with different configurations in FFD task and propose leveraging the ViT network with self-supervised learning on real-face datasets to pre-train a backbone, equipping it with superior facial representation capabilities. We then build a competitive backbone fine-tuning framework that strengthens the backbone's ability to extract diverse forgery cues within a competitive learning mechanism. Moreover, we devise a threshold optimization mechanism that utilizes prediction confidence to improve the inference reliability. Comprehensive experiments demonstrate that our FFD model with the elaborate backbone achieves excellent performance in FFD and extra face-related tasks, i.e., presentation attack detection. Code and models are available at https://github.com/zhenglab/FFDBackbone.


Key findings
The proposed FFD model with its elaborate backbone achieves excellent performance and significantly better generalization across diverse unseen forgery patterns compared to previous state-of-the-art methods. The integration of SSL on real faces, competitive fine-tuning, and uncertainty-based threshold optimization proved crucial for enhancing accuracy and reliability. The method also demonstrates strong performance and robustness in the related Presentation Attack Detection (PAD) task.
Approach
The method involves pre-training a Vision Transformer (ViT) backbone using self-supervised learning (SSL) on large-scale real-face datasets to gain robust facial representation. A competitive backbone fine-tuning framework with a dual-branch architecture, decorrelation constraint, and an uncertainty-based fusion module is then applied to extract diverse forgery cues. Finally, an uncertainty-based threshold optimization mechanism is used during inference to enhance the accuracy and reliability of real/fake classifications.
Datasets
CelebA, CelebV-Text, FFHQ (for real-face pre-training); Faceforensics++ (FF++) c23 (for fine-tuning); Celeb-DF (CDF), DFDC, FFIW, and 9 custom cross-datasets generated by DCGAN, StyleGAN3, StarGAN2, AttnGAN, E4S, Wav2Lip, SD-1.5, DiffFace, Diffused Heads (for evaluation). OULU-NPU (O), CASIA-FASD (C), Idiap Replay-Attack (I), MSU-MFSD (M) (for PAD task).
Model(s)
UNKNOWN
Author countries
China