LoRA Patching: Exposing the Fragility of Proactive Defenses against Deepfakes

Authors: Zuomin Qu, Yimao Guo, Qianyue Hu, Wei Lu

Published: 2025-10-04 09:22:26+00:00

AI Summary

This paper reveals the fragility of proactive Deepfake defenses, which rely on embedding adversarial perturbations in facial images to prevent manipulation. The authors propose LoRA Patching, a novel attack approach that injects Low-Rank Adaptation (LoRA) patches into Deepfake generators to bypass these defenses efficiently. The method includes a learnable gating mechanism and a Multi-Modal Feature Alignment (MMFA) loss to stabilize training and maintain high-quality adversarial outputs.

Abstract

Deepfakes pose significant societal risks, motivating the development of proactive defenses that embed adversarial perturbations in facial images to prevent manipulation. However, in this paper, we show that these preemptive defenses often lack robustness and reliability. We propose a novel approach, Low-Rank Adaptation (LoRA) patching, which injects a plug-and-play LoRA patch into Deepfake generators to bypass state-of-the-art defenses. A learnable gating mechanism adaptively controls the effect of the LoRA patch and prevents gradient explosions during fine-tuning. We also introduce a Multi-Modal Feature Alignment (MMFA) loss, encouraging the features of adversarial outputs to align with those of the desired outputs at the semantic level. Beyond bypassing, we present defensive LoRA patching, embedding visible warnings in the outputs as a complementary solution to mitigate this newly identified security vulnerability. With only 1,000 facial examples and a single epoch of fine-tuning, LoRA patching successfully defeats multiple proactive defenses. These results reveal a critical weakness in current paradigms and underscore the need for more robust Deepfake defense strategies. Our code is available at https://github.com/ZOMIN28/LoRA-Patching.


Key findings
LoRA patching successfully defeats multiple proactive defenses (Disrupting, PG, CMUA, DF-RAP), reducing the average Defense Success Rate (DSR) from 83.8% to 1.6%. The method is highly efficient, requiring only 1,000 facial examples and significantly less training time and parameters compared to traditional adversarial training methods. Defensive LoRA patching was also shown to effectively embed visible warnings in generated outputs as a complementary solution.
Approach
The approach uses LoRA patching, injecting plug-and-play LoRA blocks into the convolutional layers of pre-trained Deepfake generators (StarGAN, AttGAN, HiSD). The LoRA parameters are fine-tuned via a bi-level min-max optimization based on adversarial training. A learnable gating mechanism stabilizes the fine-tuning, and the MMFA loss, utilizing pixel-level, ResNet image features, and BLIP semantic features, ensures the patched model maps both benign and adversarial inputs to high-quality fake images.
Datasets
CelebA
Model(s)
LoRA, StarGAN, AttGAN, HiSD, ResNet-50, BLIP image encoder
Author countries
China