Who Made This? Fake Detection and Source Attribution with Diffusion Features

Authors: Simone Bonechi, Paolo Andreini, Barbara Toniella Corradini

Published: 2025-10-31 16:27:34+00:00

AI Summary

The paper introduces FRIDA, a lightweight and training-free framework that leverages internal latent activations (diffusion features) from a pre-trained Stable Diffusion Model (SDM) U-Net for synthetic image forensics. FRIDA achieves state-of-the-art cross-generator performance for deepfake detection using a k-NN classifier and enables accurate source generator attribution using a compact neural model. The results indicate that diffusion representations inherently encode robust, generator-specific patterns useful for detecting and attributing synthetic media.

Abstract

The rapid progress of generative diffusion models has enabled the creation of synthetic images that are increasingly difficult to distinguish from real ones, raising concerns about authenticity, copyright, and misinformation. Existing supervised detectors often struggle to generalize across unseen generators, requiring extensive labeled data and frequent retraining. We introduce FRIDA (Fake-image Recognition and source Identification via Diffusion-features Analysis), a lightweight framework that leverages internal activations from a pre-trained diffusion model for deepfake detection and source generator attribution. A k-nearest-neighbor classifier applied to diffusion features achieves state-of-the-art cross-generator performance without fine-tuning, while a compact neural model enables accurate source attribution. These results show that diffusion representations inherently encode generator-specific patterns, providing a simple and interpretable foundation for synthetic image forensics.


Key findings
The k-NN approach achieved a new state-of-the-art average cross-generator accuracy of 88.1% on the GenImage test set, significantly outperforming supervised methods while requiring no conventional training. For the multi-class source attribution task, the MLP achieved 84.36% accuracy, demonstrating that diffusion features encode model-specific characteristics necessary for attribution, although models sharing similar architectures (e.g., SDM v1.4 and v1.5) remain confusable.
Approach
FRIDA extracts compact "Image Prototypes" from a specific layer (Decoder 16x16) of a pre-trained Stable Diffusion U-Net at the final denoising step (t=0). Deepfake detection is performed by a training-free k-Nearest Neighbor (k-NN) classifier operating on these features, requiring only a small support set. Source attribution is handled by a lightweight Multi-Layer Perceptron (MLP) trained on the same feature representations.
Datasets
GenImage, ImageNet
Model(s)
Stable Diffusion Model (SDM v1.5) U-Net (for feature extraction), k-Nearest Neighbor (k-NN), Multi-Layer Perceptron (MLP)
Author countries
Italy