Seeing Through the Blur: Unlocking Defocus Maps for Deepfake Detection

View on arXiv ← Back to list

Authors: Minsun Jeon, Simon S. Woo

Published: 2025-09-27 13:02:53+00:00

AI Summary

The paper proposes a physically interpretable deepfake detection framework that leverages defocus blur as a robust forensic signal. Since synthetic images often lack realistic depth-of-field (DoF) characteristics, discrepancies captured in the defocus blur map serve as a powerful discriminative feature. The method constructs a defocus blur map based on optical principles and uses it as input for classification models to identify manipulated content.

Abstract

The rapid advancement of generative AI has enabled the mass production of photorealistic synthetic images, blurring the boundary between authentic and fabricated visual content. This challenge is particularly evident in deepfake scenarios involving facial manipulation, but also extends to broader AI-generated content (AIGC) cases involving fully synthesized scenes. As such content becomes increasingly difficult to distinguish from reality, the integrity of visual media is under threat. To address this issue, we propose a physically interpretable deepfake detection framework and demonstrate that defocus blur can serve as an effective forensic signal. Defocus blur is a depth-dependent optical phenomenon that naturally occurs in camera-captured images due to lens focus and scene geometry. In contrast, synthetic images often lack realistic depth-of-field (DoF) characteristics. To capture these discrepancies, we construct a defocus blur map and use it as a discriminative feature for detecting manipulated content. Unlike RGB textures or frequency-domain signals, defocus blur arises universally from optical imaging principles and encodes physical scene structure. This makes it a robust and generalizable forensic cue. Our approach is supported by three in-depth feature analyses, and experimental results confirm that defocus blur provides a reliable and interpretable cue for identifying synthetic images. We aim for our defocus-based detection pipeline and interpretability tools to contribute meaningfully to ongoing research in media forensics. The implementation is publicly available at: https://github.com/irissun9602/Defocus-Deepfake-Detection

Key findings

The defocus-only model achieved state-of-the-art performance on the FF++ dataset, reaching an average AUC of 0.998. The integration of defocus features consistently improved accuracy and recall across various backbones for AIGC detection on the SYNDOF dataset. Model-level SHAP analysis confirmed that the defocus-based model focuses its attention on manipulated facial regions exhibiting physically inconsistent blur, demonstrating high interpretability.

Approach

The approach involves estimating a pixel-wise defocus blur map using an edge-based algorithm that quantifies depth-dependent blur. This map is then fed into a deep learning classifier, often XceptionNet, either as the sole input (defocus-only pipeline for deepfake detection) or fused with the raw RGB input via a dual-branch architecture (for general AIGC detection).

Datasets

FaceForensics++ (FF++ raw), SYNDOF (including data from LFDOF, CUHK, Flickr, Middlebury, SYNTHIA, MPI Sintel).

Model(s)

XceptionNet, ResNet-50, EfficientNet-B4, ViT-16.

Author countries

Republic of Korea

← Previous