DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models

View on arXiv ← Back to list

Authors: Seunghoo Hong, Geonho Son, Juhun Lee, Simon S. Woo

Published: 2025-10-01 11:20:03+00:00

AI Summary

DIA proposes a novel adversarial defense mechanism against malicious deepfake creation through diffusion models, specifically targeting the deterministic DDIM inversion process used in real image editing. The method, DDIM Inversion Attack (DIA), disrupts the integrated DDIM trajectory path, overcoming limitations found in previous misalignment-prone defensive algorithms. This framework provides an effective, practical immunization tool against the misuse of AI for generating unethical or misinformative visual content.

Abstract

Diffusion models have shown to be strong representation learners, showcasing state-of-the-art performance across multiple domains. Aside from accelerated sampling, DDIM also enables the inversion of real images back to their latent codes. A direct inheriting application of this inversion operation is real image editing, where the inversion yields latent trajectories to be utilized during the synthesis of the edited image. Unfortunately, this practical tool has enabled malicious users to freely synthesize misinformative or deepfake contents with greater ease, which promotes the spread of unethical and abusive, as well as privacy-, and copyright-infringing contents. While defensive algorithms such as AdvDM and Photoguard have been shown to disrupt the diffusion process on these images, the misalignment between their objectives and the iterative denoising trajectory at test time results in weak disruptive performance.In this work, we present the DDIM Inversion Attack (DIA) that attacks the integrated DDIM trajectory path. Our results support the effective disruption, surpassing previous defensive methods across various editing methods. We believe that our frameworks and results can provide practical defense methods against the malicious use of AI for both the industry and the research community. Our code is available here: https://anonymous.4open.science/r/DIA-13419/.

Key findings

The proposed DIA methods consistently demonstrated superior disruption performance compared to baselines (AdvDM, Photoguard) across various inversion-editing method combinations. DIA-R, which accumulates residual error throughout the entire learned diffusion process, often showed the strongest disruption, achieving significantly lower CLIP similarity scores (indicating misalignment with the malicious edit prompt). The methods also maintained robust attack performance across varying noise budgets and sampling steps.

Approach

DIA introduces adversarial perturbations optimized to disrupt the deterministic DDIM inversion trajectory used for real image editing. It uses two variants: DIA-PT, which targets the inversion Process Trajectory, and DIA-R, which maximizes the reconstruction loss after the full inversion-denoising cycle. Gradient computation for the long differentiable trajectory is managed efficiently using decomposed back-propagation via the Vector-Jacobian Product.

Datasets

PIE benchmark (PIE-Bench)

Model(s)

Stable Diffusion v1.4 (SD v1.4)

Author countries

South Korea

← Previous