Backbone is All You Need: Assessing Vulnerabilities of Frozen Foundation Models in Synthetic Image Forensics

Authors: Chiara Musso, Joy Battocchio, Andrea Montibeller, Giulia Boato

Published: 2026-05-13 11:35:56+00:00

AI Summary

This paper introduces the Surrogate Iterative Adversarial Attack (SIAA), a gray-box attack that exploits knowledge of a Vision Transformer (ViT) backbone to undermine synthetic image detectors. SIAA operates by training a surrogate feature-processing head to align ViT features with CLIP text embeddings, then uses Projected Gradient Descent (PGD) in this feature space to craft adversarial examples. The attack achieves high success rates, often comparable to white-box attacks, highlighting a critical vulnerability in ViT-based deepfake detection models.

Abstract

As AI-generated synthetic images become increasingly realistic, Vision Transformers (ViTs) have emerged as a cornerstone of modern deepfake detection. However, the prevailing reliance on frozen, pre-trained backbones introduces a subtle yet critical vulnerability. In this work, we present the Surrogate Iterative Adversarial Attack (SIAA), a gray-box attack that exploits knowledge of the detector's ViT backbone alone and operates entirely within the target detector's feature space to craft highly effective adversarial examples. Through our experiments, involving multiple ViT-based detectors and diverse gray-box scenarios, including few-shot learning, complete training misalignment and attack transferability tests, we demonstrate that this vulnerability consistently yields high attack success rates, often approaching white-box performance. By doing so, we reveal that backbone knowledge alone is sufficient to undermine detector reliability, highlighting the urgent need for more resilient defenses in adversarial multimedia forensics.


Key findings
The SIAA achieves high attack success rates (ASR) comparable to white-box attacks, demonstrating that knowledge of the detector's ViT backbone alone is sufficient to create effective adversarial examples. While DINOv2-based detectors show increased resistance, the attack maintains strong performance against CLIP and Swin backbones, and perturbations from DINOv2 can still transfer. The vulnerability persists across various gray-box scenarios, including few-shot learning and significant misalignments in training datasets or data augmentation strategies.
Approach
The Surrogate Iterative Adversarial Attack (SIAA) is a two-phase gray-box attack. First, a surrogate Feature-Processing (FP)-head is trained to align frozen ViT features from the target detector's backbone with CLIP text embeddings, approximating the detector's decision boundaries. Second, a Projected Gradient Descent (PGD) attack is performed in this learned feature space to optimize perturbations, pushing the image's representation from its true class towards the adversarial target class.
Datasets
MS-COCO, latent diffusion, Synthbuster, RAISE, Stable Diffusion 2.1, FORLAB, FFHQ, TrueFake Dataset
Model(s)
UNKNOWN
Author countries
Italy