LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification
Authors: Shuhan Cui, Huy H. Nguyen, Trung-Nghia Le, Chun-Shien Lu, Isao Echizen
Published: 2024-07-26 09:15:29+00:00
Comment: Pages 1-13 are the main body of the paper, and pages 14-16 are the supplementary material
AI Summary
This paper introduces the novel task of image-based automated fact verification, which aims to not only detect forged images but also retrieve their original authentic counterparts. To address this, the authors propose a two-phase open framework integrating forgery identification and fact retrieval components. Additionally, they construct LookupForensics, a large-scale, multi-task dataset tailored for this task, featuring diverse image manipulations and comprehensive annotations to advance research in both forgery detection and fact retrieval.
Abstract
Amid the proliferation of forged images, notably the tsunami of deepfake content, extensive research has been conducted on using artificial intelligence (AI) to identify forged content in the face of continuing advancements in counterfeiting technologies. We have investigated the use of AI to provide the original authentic image after deepfake detection, which we believe is a reliable and persuasive solution. We call this image-based automated fact verification, a name that originated from a text-based fact-checking system used by journalists. We have developed a two-phase open framework that integrates detection and retrieval components. Additionally, inspired by a dataset proposed by Meta Fundamental AI Research, we further constructed a large-scale dataset that is specifically designed for this task. This dataset simulates real-world conditions and includes both content-preserving and content-aware manipulations that present a range of difficulty levels and have potential for ongoing research. This multi-task dataset is fully annotated, enabling it to be utilized for sub-tasks within the forgery identification and fact retrieval domains. This paper makes two main contributions: (1) We introduce a new task, image-based automated fact verification, and present a novel two-phase open framework combining forgery identification and fact retrieval. (2) We present a large-scale dataset tailored for this new task that features various hand-crafted image edits and machine learning-driven manipulations, with extensive annotations suitable for various sub-tasks. Extensive experimental results validate its practicality for fact verification research and clarify its difficulty levels for various sub-tasks.