TriDF: Evaluating Perception, Detection, and Hallucination for Interpretable DeepFake Detection

Authors: Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Zou, Ling Lo, Sheng-Ping Yang, Yu-Wen Tseng, Kun-Hsiang Lin, Chia-Ling Chen, Yu-Ting Ta, Yan-Tsung Wang, Po-Ching Chen, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng

Published: 2025-12-11 14:01:01+00:00

AI Summary

The paper introduces TriDF, a comprehensive, multimodal benchmark for interpretable DeepFake detection, covering 16 forgery types across image, video, and audio. TriDF evaluates models on three interdependent aspects: Perception (fine-grained artifact identification), Detection (classification accuracy), and Hallucination (explanation reliability). Experiments on MLLMs show that accurate perception is essential for reliable detection, but hallucination tendency severely undermines the decision-making process.

Abstract

Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecting such person-driven manipulations requires systems that not only distinguish altered content from authentic media but also provide clear and reliable reasoning. In this paper, we introduce TriDF, a comprehensive benchmark for interpretable DeepFake detection. TriDF contains high-quality forgeries from advanced synthesis models, covering 16 DeepFake types across image, video, and audio modalities. The benchmark evaluates three key aspects: Perception, which measures the ability of a model to identify fine-grained manipulation artifacts using human-annotated evidence; Detection, which assesses classification performance across diverse forgery families and generators; and Hallucination, which quantifies the reliability of model-generated explanations. Experiments on state-of-the-art multimodal large language models show that accurate perception is essential for reliable detection, but hallucination can severely disrupt decision-making, revealing the interdependence of these three aspects. TriDF provides a unified framework for understanding the interaction between detection accuracy, evidence identification, and explanation reliability, offering a foundation for building trustworthy systems that address real-world synthetic media threats.


Key findings
Accurate perception of fine-grained manipulation artifacts is found to be a necessary foundation for reliable DeepFake detection. However, hallucination (producing fabricated explanations) can severely disrupt detection performance, showing that detection quality depends jointly on accurate perception and low hallucination. Semantic artifacts requiring contextual reasoning are significantly harder for MLLMs to detect than local quality artifacts.
Approach
The authors construct TriDF, a benchmark of 5K real-fake pairs across 16 DeepFake types, supported by human annotations defining a detailed taxonomy of quality and semantic artifacts. The evaluation framework uses structured questions (<TFQ>, <MCQ>, <OEQ>) to assess model performance across three key dimensions: Perception, Detection, and Hallucination, quantified using metrics like Cover, Accuracy, and CHAIR.
Datasets
TriDF (derived from various public datasets including FaceForensics++, FFHQ, VoxCeleb2, LibriTTS, etc.)
Model(s)
Multimodal Large Language Models (MLLMs), including GPT-5, Gemini 2.5-pro, Claude-Sonnet-4.5, Qwen3-Omni, InternVL, and LLaVA-OV.
Author countries
Taiwan, China