A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement

Authors: Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang

Published: 2025-06-30 09:59:09+00:00

AI Summary

This paper introduces a unified framework for generating stealthy and highly transferable adversarial examples against deepfake detectors, leveraging diffusion-based image editing. It addresses the limitations of existing diffusion methods in generalizing beyond conventional image classification and integrating traditional transferability enhancement strategies. The proposed framework seamlessly incorporates various transferability techniques into the diffusion model's latent optimization process, enabling its application across a wider range of downstream tasks.

Abstract

Due to their powerful image generation capabilities, diffusion-based adversarial example generation methods through image editing are rapidly gaining popularity. However, due to reliance on the discriminative capability of the diffusion model, these diffusion-based methods often struggle to generalize beyond conventional image classification tasks, such as in Deepfake detection. Moreover, traditional strategies for enhancing adversarial example transferability are challenging to adapt to these methods. To address these challenges, we propose a unified framework that seamlessly incorporates traditional transferability enhancement strategies into diffusion model-based adversarial example generation via image editing, enabling their application across a wider range of downstream tasks. Our method won first place in the 1st Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media competition at ACM MM25, which validates the effectiveness of our approach.


Key findings
The method achieved a 100% transfer success rate against deepfake detectors while maintaining high perceptual similarity (SSIM comparable to traditional methods). It secured first place in the 1st Adversarial Attacks on Deepfake Detectors (AADD-2025) Challenge, demonstrating superior stealth characteristics with perturbations concentrated on foreground subjects. These results validate the effectiveness and robustness of the proposed approach.
Approach
The framework generates adversarial examples by performing DDIM inversion to obtain latent representations of a clean image, then optimizing these latents at intermediate timesteps. It incorporates textual guidance from a vision-language model during image restoration and allows for the seamless integration of traditional adversarial transferability enhancement strategies (e.g., gradient-based, transform-based, ensemble-based attacks) into the latent space optimization.
Datasets
AADD-2025 Challenge dataset
Model(s)
Stable Diffusion, Qwen (vision-language model), ResNet50, DenseNet121
Author countries
China