Exploiting the Final Component of Generator Architectures for AI-Generated Image Detection

Authors: Yanzhu Liu, Xiao Liu, Yuexuan Wang, Mondal Soumik

Published: 2026-01-28 10:35:05+00:00

AI Summary

This paper proposes a novel approach for AI-generated image detection by exploiting the common final architectural components of various image generators. The method contaminates real images using these components and trains a detector to distinguish them from original real images. Using only 300 representative training samples, their detector, fine-tuned on a DINOv3 backbone, achieves an average accuracy of 98.83% across 22 testing sets from unseen generators.

Abstract

With the rapid proliferation of powerful image generators, accurate detection of AI-generated images has become essential for maintaining a trustworthy online environment. However, existing deepfake detectors often generalize poorly to images produced by unseen generators. Notably, despite being trained under vastly different paradigms, such as diffusion or autoregressive modeling, many modern image generators share common final architectural components that serve as the last stage for converting intermediate representations into images. Motivated by this insight, we propose to contaminate real images using the generator's final component and train a detector to distinguish them from the original real images. We further introduce a taxonomy based on generators' final components and categorize 21 widely used generators accordingly, enabling a comprehensive investigation of our method's generalization capability. Using only 100 samples from each of three representative categories, our detector-fine-tuned on the DINOv3 backbone-achieves an average accuracy of 98.83% across 22 testing sets from unseen generators.

Key findings

The proposed method achieves strong zero-shot generalization, demonstrating that traces from a generator's final component are sufficient for robust AI-generated image detection. The detector, trained on a DINOv3 backbone with only 300 representative samples, reached an average accuracy of 98.83% across 22 unseen generators. It consistently outperformed various baseline methods across diverse benchmarks, including wild and fine-tuned generators.

Approach

The authors propose to "contaminate" real images by passing them through the final component of a generative model (e.g., VAE decoder, VQ de-tokenizer, super-resolution diffuser) using its corresponding encoder. A detector is then trained on these contaminated images, alongside original real images, to identify the subtle traces left by these final components. This approach enables the detector to generalize to images from unseen generators by focusing on universal architectural artifacts rather than specific model weights.

Datasets

MS-COCO 2014, Synthbuster (SD1.3, SD1.4, SD2, SDXL, DALL·E 2, DALL·E 3, Glide, Adobe Firefly, MidJourney v5), HiDream-I1 (generated), Emu3 (generated), LlamaGen (generated), SD3.5 (generated), Flux-1-dev (generated), PASCAL VOC training split, FakeBench 3, WildRF (Reddit, Facebook, Twitter), SatelliteDiffusion (amusement park, car dealership, electric substation, stadium)

Model(s)

For detection: DINOv3 backbone (and DINOv2 for comparison) with an appended fully-connected layer. For contaminating samples: Stable Diffusion 2.1 (VAE 1.1), JanusPro (VQ 2.2), PixelFlow (Super-resolution 3.3).

Author countries

Singapore

← Previous