GIFGuard: Proactive Forensics against Deepfakes in Facial GIFs via Spatiotemporal Watermarking

Authors: Shupeng Che, Zhiqing Guo, Changtao Miao, Dan Ma, Gaobo Yang

Published: 2026-04-29 10:37:05+00:00

AI Summary

This paper introduces GIFGuard, the first spatiotemporal watermarking framework designed for proactive deepfake forensics in facial GIFs. It employs a Spatiotemporal Adaptive Residual Encoder (STARE) for robust embedding and a Deep Integrity Restoration Decoder (DIRD) for accurate watermark extraction, even under severe facial manipulation. The framework also constructs and utilizes GIFfaces, a new large-scale benchmark dataset for GIF forensics, demonstrating high-fidelity visual quality and remarkable robustness against deepfakes.

Abstract

The rapid evolution of deepfake technology poses an unprecedented threat to the authenticity of Graphics Interchange Format (GIF) imagery, which serves as a representative of short-loop temporal media in social networks. However, existing proactive forensics works are designed for static images, which limits their applicability to animated GIFs. To bridge this gap, we propose GIFGuard, the first spatiotemporal watermarking framework tailored for deepfake proactive forensics in GIFs. In the embedding stage, we propose the Spatiotemporal Adaptive Residual Encoder (STARE) to ensure robustness against high-level semantic tampering. It employs a 3D convolutional backbone with adaptive channel recalibration to capture globally coherent temporal dependencies. In the extraction stage, we design the Deep Integrity Restoration Decoder (DIRD). It utilizes a spatiotemporal hourglass architecture equipped with 3D attention to restore latent features, allowing for the accurate extraction of watermark signals even under severe facial manipulation. Furthermore, we construct GIFfaces, the first large-scale benchmark dataset curated for GIF proactive forensics to facilitate research in this domain. Extensive results show that GIFGuard achieves high-fidelity visual quality and remarkable robustness performance against deepfakes. Related code and dataset will be released.


Key findings
GIFGuard achieves superior visual imperceptibility with an LPIPS score of 0.0035 and PSNR of 49.54 dB. It demonstrates remarkable robustness against deepfake attacks, maintaining an average bit error rate (BER) of approximately 0.0105%, significantly outperforming existing video watermarking baselines. The framework effectively addresses temporal consistency and semantic robustness challenges in GIF forensics, albeit with a larger model size (429.4M parameters).
Approach
GIFGuard proactively embeds watermarks into GIFs using a Spatiotemporal Adaptive Residual Encoder (STARE) that leverages a 3D convolutional backbone with adaptive channel recalibration. A Realistic Distortion Simulator (RDS) subjects the watermarked GIFs to deepfake forgeries (e.g., SimSwap, MobileFaceSwap) and other distortions for adversarial training. Watermark extraction is achieved by the Deep Integrity Restoration Decoder (DIRD), which uses a spatiotemporal hourglass architecture with 3D attention to restore latent features and accurately recover the signal.
Datasets
GIFfaces (newly constructed), Celeb-DF-v2, DFEW, CAER, DFDC, FaceForensics++ (FF++)
Model(s)
Spatiotemporal Adaptive Residual Encoder (STARE) with 3D U-Net backbone and 3D Squeeze-and-Excitation (SE) block; Deep Integrity Restoration Decoder (DIRD) with spatiotemporal hourglass architecture and 3D attention; Discriminator; Deepfake models (SimSwap, MobileFaceSwap, Ghost) for distortion simulation.
Author countries
China