Towards Generalizable Deepfake Detection via Real Distribution Bias Correction

Authors: Ming-Hui Liu, Harry Cheng, Xin Luo, Xin-Shun Xu, Mohan S. Kankanhalli

Published: 2026-03-14 16:11:00+00:00

Comment: First Version

AI Summary

This paper introduces the Real Distribution Bias Correction (RDBC) framework to enhance deepfake detector generalizability by exploiting the invariance of real data. It leverages the fixed population distribution and inherent Gaussianity of real images, employing two modules to estimate real data statistics and amplify the Gaussianity gap between real and fake samples. This approach allows detectors to effectively generalize to unseen target domains, demonstrating state-of-the-art performance in both in-domain and cross-domain settings.

Abstract

To generalize deepfake detectors to future unseen forgeries, most existing methods attempt to simulate the dynamically evolving forgery types using available source domain data. However, predicting an unbounded set of future manipulations from limited prior examples is infeasible. To overcome this limitation, we propose to exploit the invariance of \\textbf{real data} from two complementary perspectives: the fixed population distribution of the entire real class and the inherent Gaussianity of individual real images. Building on these properties, we introduce the Real Distribution Bias Correction (RDBC) framework, which consists of two key components: the Real Population Distribution Estimation module and the Distribution-Sampled Feature Whitening module. The former utilizes the independent and identically distributed (\\iid) property of real samples to derive the normal distribution form of their statistics, from which the distribution parameters can be estimated using limited source domain data. Based on the learned population distribution, the latter utilizes the inherent Gaussianity of real data as a discriminative prior and performs a sampling-based whitening operation to amplify the Gaussianity gap between real and fake samples. Through synergistic coupling of the two modules, our model captures the real-world properties of real samples, thereby enhancing its generalizability to unseen target domains. Extensive experiments demonstrate that RDBC achieves state-of-the-art performance in both in-domain and cross-domain deepfake detection.


Key findings
The RDBC framework achieved state-of-the-art performance, significantly outperforming existing methods in both in-domain (e.g., FF++) and cross-domain (Celeb-DF, DFDC, DFDCp, UADFV) deepfake detection. It demonstrated consistent performance improvements across various backbone networks and showed remarkable robustness against image corruptions like JPEG compression, Gaussian blur, and Gaussian noise. The method effectively corrected real data distribution biases, leading to a more domain-invariant representation and improved accuracy on real samples.
Approach
The RDBC framework comprises two modules: the Real Population Distribution Estimation module, which utilizes the independent and identically distributed (i.i.d.) property of real samples to estimate the normal distribution parameters of their statistics (mean and covariance). The Distribution-Sampled Feature Whitening module then incorporates this learned population distribution to perform a sampling-based whitening operation, amplifying the Gaussianity gap between real and fake samples for enhanced discriminability and generalization.
Datasets
FaceForensics++ (FF++), Celeb-DF, DFDC, DFDCp, UADFV
Model(s)
EfficientNet, Xception, ViT-L-32, ViT-B-16
Author countries
China, Singapore