Which Face and Whose Identity? Solving the Dual Challenge of Deepfake Proactive Forensics in Multi-Face Scenarios
Authors: Lei Zhang, Zhiqing Guo, Dan Ma, Gaobo Yang
Published: 2026-04-29 06:50:19+00:00
AI Summary
This paper addresses the dual challenge of deepfake localization and source tracing in complex multi-person scenarios, proposing the Deep Attributable Watermarking Framework (DAWF). DAWF features a novel multi-face encoder-decoder architecture that efficiently embeds watermarks in parallel across multiple faces, bypassing traditional offline pre-processing. Leveraging a selective regional supervision loss and embedded identity payloads, DAWF achieves the 'which + who' goal, precisely identifying forged facial regions and their original identities.
Abstract
Unlike single-face forgeries, deepfakes in complex multi-person interaction scenarios (such as group photos and multi-person meetings) more closely reflect real-world threats. Although existing proactive forensics solutions demonstrate good performance, they heavily rely on a single-face setting, making it difficult to effectively address the problems of deepfake localization and source tracing in complex multi-person environments. To address this challenge, we propose the Deep Attributable Watermarking Framework (DAWF). This framework adopts a novel multi-face encoder-decoder architecture that bypasses the cumbersome offline pre-processing steps of traditional forensics, facilitating efficient in-network parallel watermark embedding and cross-face collaborative processing. Crucially, we propose a selective regional supervision loss. This innovative mechanism guides the decoder to focus exclusively on the facial regions tampered with by deepfakes. Leveraging this mechanism alongside the embedded identity payloads, DAWF realizes the which + who goal, answering the dual questions of which facial region was forged and who was forged. Extensive experiments on challenging multi-face datasets show that DAWF achieves excellent deepfake localization and traceability in complex multi-person scenes.