LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Authors: Dat Nguyen, Enjie Ghorbel, Anis Kacem, Marcella Astrid, Djamila Aouada

Published: 2026-04-05 12:08:48+00:00

Comment: Journal version of LAA-Net (CVPR 2024)

AI Summary

The paper introduces Localized Artifact Attention X (LAA-X), a novel deepfake detection framework designed for robustness against high-quality forgeries and generalization to unseen manipulations. It employs an explicit attention strategy through a multi-task learning framework combined with blending-based data synthesis to guide the model toward localized, artifact-prone regions. LAA-X is compatible with both CNN (LAA-Net) and transformer (LAA-Former/LAA-Swin) backbones, achieving state-of-the-art performance across multiple benchmarks despite being trained only on real and pseudo-fake samples.

Abstract

In this paper, we propose Localized Artifact Attention X (LAA-X), a novel deepfake detection framework that is both robust to high-quality forgeries and capable of generalizing to unseen manipulations. Existing approaches typically rely on binary classifiers coupled with implicit attention mechanisms, which often fail to generalize beyond known manipulations. In contrast, LAA-X introduces an explicit attention strategy based on a multi-task learning framework combined with blending-based data synthesis. Auxiliary tasks are designed to guide the model toward localized, artifact-prone (i.e., vulnerable) regions. The proposed framework is compatible with both CNN and transformer backbones, resulting in two different versions, namely, LAA-Net and LAA-Former, respectively. Despite being trained only on real and pseudo-fake samples, LAA-X competes with state-of-the-art methods across multiple benchmarks. Code and pre-trained weights for LAA-Net\\footnote{https://github.com/10Ring/LAA-Net} and LAA-Former\\footnote{https://github.com/10Ring/LAA-Former} are publicly available.


Key findings
LAA-X consistently achieves state-of-the-art results on challenging cross-dataset benchmarks, including large-scale and diffusion-based deepfakes, demonstrating strong generalization capabilities. The explicit attention mechanism to vulnerable regions significantly enhances performance, with LAA-Former and LAA-Swin showing improved robustness to noise perturbations compared to LAA-Net. The proposed E-FPN for CNNs and L2-Att for transformers effectively guide the models to focus on subtle artifact-prone areas, leading to superior detection and localization.
Approach
LAA-X is a multi-task learning framework that uses blending-based data synthesis to create pseudo-fakes and estimate 'vulnerable regions' (artifact-prone areas). Auxiliary tasks, such as heatmap regression and self-consistency for CNNs (LAA-Net) or vulnerable patch prediction for transformers (LAA-Former/LAA-Swin), enforce explicit attention to these regions, complementing the main binary classification task. This unified framework is designed to improve generalization and robustness to subtle deepfake artifacts.
Datasets
FF++ (for training and validation), Celeb-DFv2 (CDF2), Google DeepFake Detection (DFD), DeepFake Detection Challenge (DFDC), DeepFake Detection Challenge Preview (DFDCP), WildDeepfake (DFW), DiffSwap, DF40 (for cross-dataset evaluation). Models are initialized with weights pretrained on ImageNet (for LAA-Net) and DINO on ImageNet (for LAA-Former).
Model(s)
LAA-Net (EfficientNet-B4 backbone with Enhanced Feature Pyramid Network - E-FPN), LAA-Former (Vanilla Vision Transformer - ViT with Learning-based Local Attention - L2-Att module), LAA-Swin (Swin Transformer with L2-Att module).
Author countries
Luxembourg, Tunisia