A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement

View on arXiv ← Back to list

Authors: Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang

Published: 2025-06-30 09:59:09+00:00

AI Summary

This paper proposes a unified framework for generating stealthy adversarial examples against deepfake detectors. The framework integrates traditional transferability enhancement strategies into diffusion model-based adversarial example generation, improving generalization and winning first place in the ACM MM25 Deepfake Detection adversarial attack competition.

Abstract

Due to their powerful image generation capabilities, diffusion-based adversarial example generation methods through image editing are rapidly gaining popularity. However, due to reliance on the discriminative capability of the diffusion model, these diffusion-based methods often struggle to generalize beyond conventional image classification tasks, such as in Deepfake detection. Moreover, traditional strategies for enhancing adversarial example transferability are challenging to adapt to these methods. To address these challenges, we propose a unified framework that seamlessly incorporates traditional transferability enhancement strategies into diffusion model-based adversarial example generation via image editing, enabling their application across a wider range of downstream tasks. Our method won first place in the 1st Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media competition at ACM MM25, which validates the effectiveness of our approach.

Key findings

The proposed method achieved a 100% transfer attack success rate while maintaining high SSIM values, outperforming traditional methods and other competing teams in the ACM MM25 competition. The generated adversarial examples showed high stealthiness, with perturbations primarily affecting foreground subjects.

Approach

The approach uses DDIM inversion to obtain latent variables of deepfake images. It then optimizes a selected latent at a specific timestep, incorporating traditional transferability enhancement strategies (gradient-based, input transformation-based, etc.) and text guidance from a vision-language model to maintain image consistency while creating adversarial examples.

Datasets

AADD-2025 Challenge dataset, containing GAN and diffusion-based generated images of varying qualities and resolutions.

Model(s)

Stable Diffusion model, various deepfake detectors (Resnet50, DenseNet121, and others from the competition), and a vision-language model for text guidance.

Author countries

China

← Previous