ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

Authors: Peijun Bao, Anwei Luo, Gang Pan, Alex C. Kot, Xudong Jiang

Published: 2026-04-04 18:00:05+00:00

Comment: [CVPR 2026] The first benchmark for action-level deepfake localization

AI Summary

This paper introduces ActivityForensics, the first large-scale benchmark dataset designed for localizing manipulated activities in videos, addressing activity-level forgeries overlooked by existing appearance-level benchmarks. It further proposes Temporal Artifact Diffuser (TADiff), a novel diffusion-based baseline that exposes subtle artifact cues through feature regularization. The benchmark provides comprehensive evaluation protocols, facilitating research in fine-grained video forensics.

Abstract

Temporal forgery localization aims to temporally identify manipulated segments in videos. Most existing benchmarks focus on appearance-level forgeries, such as face swapping and object removal. However, recent advances in video generation have driven the emergence of activity-level forgeries that modify human actions to distort event semantics, resulting in highly deceptive forgeries that critically undermine media authenticity and public trust. To overcome this issue, we introduce ActivityForensics, the first large-scale benchmark for localizing manipulated activity in videos. It contains over 6K forged video segments that are seamlessly blended into the video context, rendering high visual consistency that makes them almost indistinguishable from authentic content to the human eye. We further propose Temporal Artifact Diffuser (TADiff), a simple yet effective baseline that exposes artifact cues through a diffusion-based feature regularizer. Based on ActivityForensics, we introduce comprehensive evaluation protocols covering intra-domain, cross-domain, and open-world settings, and benchmark a wide range of state-of-the-art forgery localizers to facilitate future research. The dataset and code are available at https://activityforensics.github.io.


Key findings
TADiff consistently outperforms state-of-the-art methods across intra-domain, cross-domain, and open-world scenarios on the ActivityForensics benchmark, showing significant improvements in Average Precision, particularly for precise temporal boundary localization. The diffusion-based feature regularizer effectively reduces semantic bias, enhancing sensitivity to subtle visual artifacts and demonstrating strong generalization to unseen manipulation types. The ActivityForensics dataset proves effective in generalizing to real-world manipulations.
Approach
The proposed Temporal Artifact Diffuser (TADiff) enhances manipulated activity localization by injecting stochastic perturbations into the multi-scale feature space of a temporal transformer to mitigate semantic bias. It then iteratively denoises these features using Feature-wise Linear Modulation (FiLM) and Denoising Diffusion Implicit Model (DDIM) updates to amplify subtle forgery-discriminative signals. This process regularizes the feature manifold, making the model more sensitive to artifact cues.
Datasets
ActivityForensics
Model(s)
UNKNOWN
Author countries
China, Singapore, Vietnam