Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Authors: Chanhui Lee, Seunghyun Shin, Donggyu Choi, Hae-gon Jeon, Jeany Son

Published: 2026-02-16 12:08:37+00:00

Comment: Working paper

AI Summary

This paper proposes the first universal image immunization framework designed to defend against diffusion-based image editing. It generates a single, broadly applicable adversarial perturbation (UAP) that injects a semantic target into protected images while suppressing their original content, thereby misdirecting editing models. The method effectively blocks malicious editing attempts, offers scalability due to its universal nature, and operates even in data-free settings without requiring access to training data or domain knowledge.

Abstract

Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.


Key findings
The method significantly outperforms several universal baselines in blocking diffusion-based image editing and achieves comparable performance to image-specific methods under a more restricted perturbation budget. It demonstrates strong black-box transferability across diverse diffusion models (including U-Net and DiT architectures) and robust performance against various purification techniques. The approach is also effective in data-free settings and offers near-zero inference-time cost, enhancing its practicality for real-world deployment.
Approach
The proposed framework generates a Universal Adversarial Perturbation (UAP) via semantic injection to immunize images. This UAP is trained using two loss functions: a target semantic injection loss that encourages alignment with intended target semantics, and a source semantic suppression loss that minimizes the influence of the original image's content. This causes diffusion-based editing models to misinterpret the source image, leading edits towards the injected target semantics rather than the original content.
Datasets
LAION-2B-en (for data-dependent training), randomly generated jigsaw puzzle images (for data-free training), custom-generated evaluation dataset (500 images across 10 object classes using Stable Diffusion V3), MS-COCO, DomainNet, and a custom inpainting dataset (DI).
Model(s)
Stable Diffusion (V1.5, V1.4, V2.0, V3), InstructPix2Pix, FLUX (DiT-based model). Baselines adapted for universal immunization include methods based on PhotoGuard (Encoder Attack, Diffusion Attack), AdvPaint, Semantic Attack, and EditShield.
Author countries
South Korea