Explainable Deepfake Detection with RL Enhanced Self-Blended Images

Authors: Ning Jiang, Dingheng Zeng, Yanhong Liu, Haiyang Yi, Shijie Yu, Minghe Weng, Haifeng Shen, Ying Li

Published: 2026-01-22 03:55:46+00:00

Comment: Accepted at ICASSP 2026

AI Summary

This paper introduces an explainable deepfake detection method addressing the scarcity of annotated data for Multimodal Large Language Models (MLLMs). It proposes an automated Chain-of-Thought (CoT) data generation framework based on Self-Blended Images (SBI) and an RL-enhanced detection framework. The approach reduces annotation costs and improves cross-domain generalization, achieving competitive performance on multiple cross-dataset benchmarks.

Abstract

Most prior deepfake detection methods lack explainable outputs. With the growing interest in multimodal large language models (MLLMs), researchers have started exploring their use in interpretable deepfake detection. However, a major obstacle in applying MLLMs to this task is the scarcity of high-quality datasets with detailed forgery attribution annotations, as textual annotation is both costly and challenging - particularly for high-fidelity forged images or videos. Moreover, multiple studies have shown that reinforcement learning (RL) can substantially enhance performance in visual tasks, especially in improving cross-domain generalization. To facilitate the adoption of mainstream MLLM frameworks in deepfake detection with reduced annotation cost, and to investigate the potential of RL in this context, we propose an automated Chain-of-Thought (CoT) data generation framework based on Self-Blended Images, along with an RL-enhanced deepfake detection framework. Extensive experiments validate the effectiveness of our CoT data construction pipeline, tailored reward mechanism, and feedback-driven synthetic data generation approach. Our method achieves performance competitive with state-of-the-art (SOTA) approaches across multiple cross-dataset benchmarks. Implementation details are available at https://github.com/deon1219/rlsbi.


Key findings
The proposed method achieves competitive performance compared to state-of-the-art approaches on multiple cross-dataset benchmarks, notably outperforming previous methods on CDF2 and most MLLM-based methods on the DFD dataset at frame-level. Ablation studies confirmed that the keyword verification reward and feedback-guided online data synthesis significantly contribute to improved detection capabilities and generalization.
Approach
The method involves an automated Chain-of-Thought (CoT) data generation pipeline using Self-Blended Images to create annotated forged images with precise forgery attribution. This data is then used to fine-tune an MLLM in a supervised manner. Reinforcement learning (RL) with a tailored reward mechanism, including text-based forgery localization and an adaptive feedback-driven synthetic data generation, further enhances the model's cross-domain generalization.
Datasets
DeepfakeBench, FaceForensics++ (FF++), CDF2, DFD, DFDC, DFDCP
Model(s)
InternVL3-38B (for CoT data generation), LLaVA 1.5-7b (as base MLLM for detection), LoRA for ViT and LLM, Video-R1's framework for SFT and RL training.
Author countries
China