Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

View on arXiv ← Back to list

Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

Published: 2024-04-27 11:20:49+00:00

AI Summary

This paper proposes AdvMark, a method for fine-tuning robust watermarking models to enhance the detectability of watermarked images by deepfake detectors. AdvMark leverages the adversarial vulnerability of passive detectors to improve detection accuracy without modifying the detectors themselves, while maintaining watermark extractability for provenance tracking.

Abstract

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

Key findings

Experiments demonstrate AdvMark's effectiveness in improving deepfake detection accuracy in both white-box and black-box settings. Watermark extractability remains high, ensuring provenance tracking capabilities are preserved. The approach shows promise for enhancing the reliability of deepfake detection systems without requiring retraining or modification of existing detectors.

Approach

AdvMark fine-tunes existing robust watermarking models by adding an adversarial loss term during training. This loss encourages the watermarked images to be correctly classified by a deepfake detector, effectively making the watermarks adversarial to improve detection performance without altering the detector itself.

Datasets

CelebA-HQ, SimSwap, FOMM, StarGAN, and StyleGAN generated deepfakes.

Model(s)

MBRS and SepMark watermarking models; various deepfake detectors including Xception, EfficientNet, CNND, FFD, PatchForensics, MultiAtt, RFM, RECCE, and SBI.

Author countries

China

← Previous