A Novel Local Focusing Mechanism for Deepfake Detection Generalization

View on arXiv ← Back to list

Authors: Mingliang Li, Lin Yuanbo Wu, Changhong Liu, Hanxi Li

Published: 2025-08-23 14:06:30+00:00

AI Summary

This paper proposes a Local Focus Mechanism (LFM) for deepfake detection that addresses the poor generalization of existing methods. LFM uses a Salience Network and Top-K Pooling to focus on discriminative local features, improving accuracy and average precision while maintaining high efficiency.

Abstract

The rapid advancement of deepfake generation techniques has intensified the need for robust and generalizable detection methods. Existing approaches based on reconstruction learning typically leverage deep convolutional networks to extract differential features. However, these methods show poor generalization across object categories (e.g., from faces to cars) and generation domains (e.g., from GANs to Stable Diffusion), due to intrinsic limitations of deep CNNs. First, models trained on a specific category tend to overfit to semantic feature distributions, making them less transferable to other categories, especially as network depth increases. Second, Global Average Pooling (GAP) compresses critical local forgery cues into a single vector, thus discarding discriminative patterns vital for real-fake classification. To address these issues, we propose a novel Local Focus Mechanism (LFM) that explicitly attends to discriminative local features for differentiating fake from real images. LFM integrates a Salience Network (SNet) with a task-specific Top-K Pooling (TKP) module to select the K most informative local patterns. To mitigate potential overfitting introduced by Top-K pooling, we introduce two regularization techniques: Rank-Based Linear Dropout (RBLD) and Random-K Sampling (RKS), which enhance the model's robustness. LFM achieves a 3.7 improvement in accuracy and a 2.8 increase in average precision over the state-of-the-art Neighboring Pixel Relationships (NPR) method, while maintaining exceptional efficiency at 1789 FPS on a single NVIDIA A6000 GPU. Our approach sets a new benchmark for cross-domain deepfake detection. The source code are available in https://github.com/lmlpy/LFM.git

Key findings

LFM achieves a significant improvement in accuracy (95.9% vs 92.2% for NPR) and average precision (98.6% vs 95.8% for NPR) across various deepfake datasets and generation sources. The method also maintains high efficiency at 1789 FPS on a single NVIDIA A6000 GPU.

Approach

The proposed LFM integrates a Salience Network (SNet) with a task-specific Top-K Pooling (TKP) module to select the most informative local patterns. To mitigate overfitting, Rank-Based Linear Dropout (RBLD) and Random-K Sampling (RKS) regularization techniques are employed.

Datasets

ForenSynths (training), ForenSynths, NPR’s GAN data, DIRE’s diffusion model dataset, Ojha’s diffusion model dataset, NPR’s diffusion model dataset, LSUN, ImageNet, CelebA, Celeba-HQ, COCO, and FaceForensics++ (testing). The training set uses four categories (car, cat, horse, chair) from ForenSynths.

Model(s)

A custom model incorporating a Salience Network (SNet), Top-K Pooling (TKP), Rank-Based Linear Dropout (RBLD), and Random-K Sampling (RKS). The NPR method is used as a basis for feature extraction.

Author countries

China, United Kingdom

← Previous