Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting?

Authors: Serafino Pandolfini, Lorenzo Pellegrini, Matteo Ferrara, Davide Maltoni

Published: 2025-12-18 15:54:51+00:00

AI Summary

This study systematically evaluates the generalization capability of detectors trained on fully synthetic images when applied to localized deepfakes created via image inpainting. The evaluation uses multiple datasets covering diverse generators, mask sizes, and manipulation techniques. Results show strong transferability for medium and large area manipulations, often outperforming dedicated ad hoc detection methods.

Abstract

The rapid progress of generative AI has enabled highly realistic image manipulations, including inpainting and region-level editing. These approaches preserve most of the original visual context and are increasingly exploited in cybersecurity-relevant threat scenarios. While numerous detectors have been proposed for identifying fully synthetic images, their ability to generalize to localized manipulations remains insufficiently characterized. This work presents a systematic evaluation of state-of-the-art detectors, originally trained for the deepfake detection on fully synthetic images, when applied to a distinct challenge: localized inpainting detection. The study leverages multiple datasets spanning diverse generators, mask sizes, and inpainting techniques. Our experiments show that models trained on a large set of generators exhibit partial transferability to inpainting-based edits and can reliably detect medium- and large-area manipulations or regeneration-style inpainting, outperforming many existing ad hoc detection approaches.


Key findings
Detectors show strong transferability to inpainting, especially for manipulations covering medium (>20%) or large image areas. Performance degrades significantly for small manipulations (<5%) or those created by advanced generative models like Flux and Firefly. DINOv3 ViT-L/16 consistently achieved the highest AUROC scores among the tested models, highlighting the effectiveness of self-supervised vision transformers.
Approach
The study evaluates state-of-the-art deepfake detectors, originally trained for binary real-versus-fake classification on fully synthetic images (using AI-GenBench), against a corpus of localized inpainted images. Performance is analyzed across various dimensions including mask size, generator type, and compression robustness, using image-level classification without spatial supervision.
Datasets
BR-Gen, TGIF, TGIF2 (trained on AI-GenBench)
Model(s)
ResNet-50 CLIP, DINOv2 ViT-L/14, DINOv3 ViT-L/16
Author countries
Italy