MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

Authors: Tanjim Rahaman Fardin, S M Zunaid Alam, Mahadi Hasan Fahim, Md Faysal Mahfuz

Published: 2026-04-20 17:32:48+00:00

Comment: 8 pages, 5 figures

AI Summary

This paper introduces MetaCloak-JPEG, an adversarial perturbation method designed to prevent unauthorized DreamBooth-based deepfake generation that is robust to JPEG compression. Unlike prior methods, MetaCloak-JPEG backpropagates gradients through the JPEG compression pipeline using a Differentiable JPEG (DiffJPEG) layer built on the Straight-Through Estimator (STE). This approach allows adversarial energy to concentrate in low- and mid-frequency DCT bands that survive compression, preserving the protective signal.

Abstract

The rapid progress of subject-driven text-to-image synthesis, and in particular DreamBooth, has enabled a consent-free deepfake pipeline: an adversary needs only 4-8 publicly available face images to fine-tune a personalized diffusion model and produce photorealistic harmful content. Current adversarial face-protection systems -- PhotoGuard, Anti-DreamBooth, and MetaCloak -- perturb user images to disrupt surrogate fine-tuning, but all share a structural blindness: none backpropagates gradients through the JPEG compression pipeline that every major social-media platform applies before adversary access. Because JPEG quantization relies on round(), whose derivative is zero almost everywhere, adversarial energy concentrates in high-frequency DCT bands that JPEG discards, eliminating 60-80% of the protective signal. We introduce MetaCloak-JPEG, which closes this gap by inserting a Differentiable JPEG (DiffJPEG) layer built on the Straight-Through Estimator (STE): the forward pass applies standard JPEG compression, while the backward pass replaces round() with the identity. DiffJPEG is embedded in a JPEG-aware EOT distribution (~70% of augmentations include DiffJPEG) and a curriculum quality-factor schedule (QF: 95 to 50) inside a bilevel meta-learning loop. Under an l-inf perturbation budget of eps=8/255, MetaCloak-JPEG attains 32.7 dB PSNR, a 91.3% JPEG survival rate, and outperforms PhotoGuard on all 9 evaluated JPEG quality factors (9/9 wins, mean denoising-loss gain +0.125) within a 4.1 GB training-memory budget.

Key findings

MetaCloak-JPEG achieves a 32.7 dB PSNR for imperceptibility and a 91.3% JPEG survival rate of the perturbation. It outperforms PhotoGuard across all 9 evaluated JPEG quality factors (QF 50-100) with a mean denoising-loss gain of +0.125, demonstrating superior JPEG robustness. The method successfully channels adversarial energy into low- and mid-frequency DCT bands that survive compression, verified by FFT spectrum analysis.

Approach

MetaCloak-JPEG addresses the problem of JPEG-induced loss of adversarial perturbation effectiveness by integrating a DiffJPEG layer, utilizing the Straight-Through Estimator, into a bilevel meta-learning loop. This allows the perturbation optimization to account for JPEG compression during training by enabling gradient flow through the quantization step. The method also incorporates a JPEG-aware Expectation Over Transformations (EOT) distribution and a curriculum quality-factor schedule to enhance robustness across various compression levels.

Datasets

CelebA-HQ

Model(s)

CompVis stable-diffusion-v1-4 (as surrogate), VAE encoder, UNet denoisers

Author countries

Bangladesh

← Previous