Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

Authors: Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland, Fadi Al Machot, Ngai-Man Cheung

Published: 2025-12-08 21:08:25+00:00

AI Summary

This paper proposes frequency-domain masking as a training augmentation technique to achieve sustainable and universal deepfake detection for AI-generated images. The method enhances generalization across diverse and unseen generative models (GANs and diffusion models) by suppressing reliance on generator-specific frequency artifacts. It also demonstrates consistent performance retention even under significant model pruning, aligning with Green AI principles.

Abstract

Universal deepfake detection aims to identify AI-generated images across a broad range of generative models, including unseen ones. This requires robust generalization to new and unseen deepfakes, which emerge frequently, while minimizing computational overhead to enable large-scale deepfake screening, a critical objective in the era of Green AI. In this work, we explore frequency-domain masking as a training strategy for deepfake detectors. Unlike traditional methods that rely heavily on spatial features or large-scale pretrained models, our approach introduces random masking and geometric transformations, with a focus on frequency masking due to its superior generalization properties. We demonstrate that frequency masking not only enhances detection accuracy across diverse generators but also maintains performance under significant model pruning, offering a scalable and resource-conscious solution. Our method achieves state-of-the-art generalization on GAN- and diffusion-generated image datasets and exhibits consistent robustness under structured pruning. These results highlight the potential of frequency-based masking as a practical step toward sustainable and generalizable deepfake detection. Code and models are available at: [https://github.com/chandlerbing65nm/FakeImageDetection](https://github.com/chandlerbing65nm/FakeImageDetection).


Key findings
Frequency masking consistently outperformed spatial masking and geometric transformations, achieving state-of-the-art generalization on GAN and diffusion benchmarks (e.g., 88.10% average mAP for ResNet50). Combining frequency masking with translation augmentation yielded the strongest results (90.51% mAP). Crucially, the benefits of frequency masking largely persisted under moderate structural pruning (up to 50% parameter reduction), confirming its effectiveness for resource-conscious deepfake detection.
Approach
The proposed approach applies frequency-domain masking during supervised training. It transforms input images using the Fast Fourier Transform (FFT), randomly zeroes out frequency components (amplitude spectrum) based on a selected ratio (optimal at 15%), and then inverse-transforms the image back for classification training. This forces the detector to learn more robust, frequency-invariant features that generalize across different synthesis pipelines.
Datasets
Wang et al. [38] benchmarks (ProGAN, StyleGAN, BigGAN, etc.), Ojha et al. [30] diffusion datasets (Guided Diffusion, LDM, Glide, DALL-E-mini), and a specialized FakeFish dataset (ControlNet, Stable Diffusion).
Model(s)
ResNet50, VGG11, MobileNetv2, along with various structural pruning methods (Slim, LAMP, GReg, DepGraph).
Author countries
Norway, Singapore