MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models

Authors: Wenbo Xu, Wei Lu, Xiangyang Luo, Jiantao Zhou

Published: 2026-01-28 09:44:31+00:00

AI Summary

This paper introduces MARE, a novel framework for explainable Deepfake detection leveraging Vision-Language Models (VLMs). MARE enhances VLM accuracy and reliability by incorporating reinforcement learning from human feedback (RLHF) with comprehensive reward functions, which incentivize the generation of text-spatially aligned reasoning content. Additionally, it features a forgery disentanglement module designed to capture intrinsic forgery traces from high-level facial semantics.

Abstract

Deepfake detection is a widely researched topic that is crucial for combating the spread of malicious content, with existing methods mainly modeling the problem as classification or spatial localization. The rapid advancements in generative models impose new demands on Deepfake detection. In this paper, we propose multimodal alignment and reinforcement for explainable Deepfake detection via vision-language models, termed MARE, which aims to enhance the accuracy and reliability of Vision-Language Models (VLMs) in Deepfake detection and reasoning. Specifically, MARE designs comprehensive reward functions, incorporating reinforcement learning from human feedback (RLHF), to incentivize the generation of text-spatially aligned reasoning content that adheres to human preferences. Besides, MARE introduces a forgery disentanglement module to capture intrinsic forgery traces from high-level facial semantics, thereby improving its authenticity detection capability. We conduct thorough evaluations on the reasoning content generated by MARE. Both quantitative and qualitative experimental results demonstrate that MARE achieves state-of-the-art performance in terms of accuracy and reliability.


Key findings
MARE achieves state-of-the-art performance in Deepfake detection, particularly showing significant improvements on challenging datasets like WDF and DFDC under both intra- and fuse-dataset setups. It also demonstrates superior performance in generating explainable reasoning content, outperforming baseline VLMs and other methods in identification accuracy and F1 scores by providing reliable and text-spatially aligned explanations for deepfake traces.
Approach
MARE enhances Vision-Language Models (VLMs) for Deepfake detection and reasoning using a Reinforcement Learning from Human Feedback (RLHF) paradigm. It designs multi-dimensional reward functions (format, accuracy, text relevance, ROI, and alignment) to encourage text-spatially aligned reasoning content. Furthermore, a forgery disentanglement module is introduced to decouple identity, structural, and forgery trace features from face images, thereby improving authenticity detection by focusing on subtle forgery traces.
Datasets
FaceForensics++ (FF++), Celeb-DF, WildDeepfake (WDF), DFDC, DFD, Deepfake multimodal alignment dataset (DMA) augmented from DDVQA.
Model(s)
Qwen2.5-VL-3B, Qwen2.5-VL-7B, InternVL2.5-4B (as VLMs), VLM-R1 (VLM toolkit), GRPO (RL algorithm), Xception (FDM backbone), SentenceTransformer (sentence encoder), MediaPipe Face Mesh (face landmark detection).
Author countries
China, Macau