Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

Published: 2024-04-27 11:20:49+00:00

Comment: Accepted by IJCAI 2024

AI Summary

This paper introduces AdvMark, a novel adversarial watermarking technique designed to prevent existing robust watermarks from degrading Deepfake detector performance. AdvMark fine-tunes robust watermarking models to embed adversarial watermarks that enhance the detectability of forged images by passive Deepfake detectors, while still allowing for provenance tracking. This plug-and-play solution improves detection accuracy without requiring modifications to deployed Deepfake detectors.

Abstract

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.


Key findings
AdvMark significantly improves Deepfake detection accuracy in both white-box (up to nearly 100%) and black-box scenarios (through adversarial transferability and ensemble attacks), effectively transforming watermarks from harmful to helpful. It achieves this while maintaining robust watermark extraction performance comparable to baselines and preserving pleasurable visual quality of the watermarked images. This method provides harmless provenance tracking while enhancing forensic detectability.
Approach
AdvMark proposes a two-stage approach: first, a robust watermarking encoder-decoder is pre-trained to embed and extract watermarks while maintaining visual quality. Second, this robust watermarking model is fine-tuned using an adversarial loss that intentionally "fools" a frozen Deepfake detector into correctly classifying watermarked images as fake (if fake) or real (if real), thereby enhancing their forensic detectability without altering the detector.
Datasets
CelebA-HQ, SimSwap, FOMM, StarGAN, StyleGAN
Model(s)
Watermarking models: MBRS, SepMark. Deepfake detectors: Xception, EfficientNet, CNND, FFD, PatchForensics, MultiAtt, RFM, RECCE, SBI.
Author countries
China