DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning

View on arXiv ← Back to list

Authors: Tharindu Fernando, Clinton Fookes, Sridha Sridharan

Published: 2025-11-07 03:24:50+00:00

AI Summary

This paper introduces DeepForgeSeal, a proactive deepfake detection framework utilizing latent space-driven semi-fragile watermarking. The system employs Multi-Agent Adversarial Reinforcement Learning (MAARL) to achieve an optimal balance between watermark robustness against benign distortions and sensitivity to malicious tampering. Evaluations show that DeepForgeSeal significantly outperforms current state-of-the-art methods on CelebA and CelebA-HQ benchmarks.

Abstract

Rapid advances in generative AI have led to increasingly realistic deepfakes, posing growing challenges for law enforcement and public trust. Existing passive deepfake detectors struggle to keep pace, largely due to their dependence on specific forgery artifacts, which limits their ability to generalize to new deepfake types. Proactive deepfake detection using watermarks has emerged to address the challenge of identifying high-quality synthetic media. However, these methods often struggle to balance robustness against benign distortions with sensitivity to malicious tampering. This paper introduces a novel deep learning framework that harnesses high-dimensional latent space representations and the Multi-Agent Adversarial Reinforcement Learning (MAARL) paradigm to develop a robust and adaptive watermarking approach. Specifically, we develop a learnable watermark embedder that operates in the latent space, capturing high-level image semantics, while offering precise control over message encoding and extraction. The MAARL paradigm empowers the learnable watermarking agent to pursue an optimal balance between robustness and fragility by interacting with a dynamic curriculum of benign and malicious image manipulations simulated by an adversarial attacker agent. Comprehensive evaluations on the CelebA and CelebA-HQ benchmarks reveal that our method consistently outperforms state-of-the-art approaches, achieving improvements of over 4.5% on CelebA and more than 5.3% on CelebA-HQ under challenging manipulation scenarios.

Key findings

DeepForgeSeal achieves state-of-the-art deepfake detection accuracy, showing improvements of over 5.3% on CelebA-HQ under challenging manipulations. Crucially, it demonstrates superior semi-fragility, achieving near-perfect watermark recovery accuracy against benign edits (e.g., BRA up to 1.00) while showing extremely high fragility against malicious face manipulations (BRA as low as 0.06). The proposed latent space embedding also yields better visual fidelity, achieving the highest PSNR (48.39) and SSIM (0.97).

Approach

The framework uses a learnable watermark embedder operating in the high-dimensional spherical latent space derived from CLIP features, providing semantic stealth. MAARL pits a watermarking agent against an adversarial attacker agent, which generates a dynamic curriculum of combinatorial attacks (benign and malicious) guided by a reward function that encourages semantic drift and target failure regions.

Datasets

Flickr-Faces-HQ (FFHQ), CelebA, CelebA-HQ

Model(s)

CLIP image/text encoders, MLP, Transposed Convolutional Network Decoder

Author countries

Australia