Patch-Discontinuity Mining for Generalized Deepfake Detection

Authors: Huanhuan Yuan, Yang Ping, Zhengqin Xu, Junyi Cao, Shuai Jia, Chao Ma

Published: 2025-12-26 13:18:14+00:00

AI Summary

This paper proposes GenDF, a generalized deepfake detection framework that transfers knowledge from large-scale vision models (like ViT) using a parameter-efficient approach to detect fake facial images. GenDF incorporates Deepfake-Specific Representation Learning (DSRL) to capture discontinuity patterns, Feature Space Redistribution (FSR) to mitigate domain mismatch, and Classification-Invariant Feature Augmentation (CIFAug) to enhance generalization. Experiments show GenDF achieves state-of-the-art cross-domain and cross-manipulation performance with minimal trainable parameters.

Abstract

The rapid advancement of generative artificial intelligence has enabled the creation of highly realistic fake facial images, posing serious threats to personal privacy and the integrity of online information. Existing deepfake detection methods often rely on handcrafted forensic cues and complex architectures, achieving strong performance in intra-domain settings but suffering significant degradation when confronted with unseen forgery patterns. In this paper, we propose GenDF, a simple yet effective framework that transfers a powerful large-scale vision model to the deepfake detection task with a compact and neat network design. GenDF incorporates deepfake-specific representation learning to capture discriminative patterns between real and fake facial images, feature space redistribution to mitigate distribution mismatch, and a classification-invariant feature augmentation strategy to enhance generalization without introducing additional trainable parameters. Extensive experiments demonstrate that GenDF achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings while requiring only 0.28M trainable parameters, validating the effectiveness and efficiency of the proposed framework.


Key findings
GenDF achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings. It significantly improves AUC scores across unseen datasets (e.g., DFD, Celeb-DF, DFDC) compared to strong baselines. Critically, GenDF achieves these results with extremely high parameter efficiency, requiring only 0.28M trainable parameters.
Approach
The authors fine-tune a pre-trained Vision Transformer (ViT) using Low-Rank Adaptation (LoRA), termed DSRL, to efficiently learn patch continuity/discontinuity patterns characteristic of real/fake faces. This fine-tuning is followed by Feature Space Redistribution (FSR) to optimize feature space separation and Class-Invariant Feature Augmentation (CIFAug), which augments features along directions orthogonal to the classification boundary to improve robustness.
Datasets
FaceForensics++ (FF++), Celeb-DF, Deepfake Detection Challenge (DFDC), DeepfakeDetection (DFD)
Model(s)
Vision Transformer (ViT-B/16), Low-Rank Adaptation (LoRA)
Author countries
China