LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification

Authors: Shuhan Cui, Huy H. Nguyen, Trung-Nghia Le, Chun-Shien Lu, Isao Echizen

Published: 2024-07-26 09:15:29+00:00

Comment: Pages 1-13 are the main body of the paper, and pages 14-16 are the supplementary material

AI Summary

This paper introduces the novel task of image-based automated fact verification, which aims to not only detect forged images but also retrieve their original authentic counterparts. To address this, the authors propose a two-phase open framework integrating forgery identification and fact retrieval components. Additionally, they construct LookupForensics, a large-scale, multi-task dataset tailored for this task, featuring diverse image manipulations and comprehensive annotations to advance research in both forgery detection and fact retrieval.

Abstract

Amid the proliferation of forged images, notably the tsunami of deepfake content, extensive research has been conducted on using artificial intelligence (AI) to identify forged content in the face of continuing advancements in counterfeiting technologies. We have investigated the use of AI to provide the original authentic image after deepfake detection, which we believe is a reliable and persuasive solution. We call this image-based automated fact verification, a name that originated from a text-based fact-checking system used by journalists. We have developed a two-phase open framework that integrates detection and retrieval components. Additionally, inspired by a dataset proposed by Meta Fundamental AI Research, we further constructed a large-scale dataset that is specifically designed for this task. This dataset simulates real-world conditions and includes both content-preserving and content-aware manipulations that present a range of difficulty levels and have potential for ongoing research. This multi-task dataset is fully annotated, enabling it to be utilized for sub-tasks within the forgery identification and fact retrieval domains. This paper makes two main contributions: (1) We introduce a new task, image-based automated fact verification, and present a novel two-phase open framework combining forgery identification and fact retrieval. (2) We present a large-scale dataset tailored for this new task that features various hand-crafted image edits and machine learning-driven manipulations, with extensive annotations suitable for various sub-tasks. Extensive experimental results validate its practicality for fact verification research and clarify its difficulty levels for various sub-tasks.


Key findings
The proposed two-phase framework demonstrated significantly better performance for image-based fact verification compared to a baseline retrieval-only framework. Experiments showed that the LookupForensics dataset presents a considerable challenge for existing SOTA forgery localization and retrieval methods, with substantial performance drops compared to traditional datasets. EfficientNet-B4 achieved high accuracy (over 90%) in multiclass forgery classification on their dataset, while copy-move detection proved to be the most challenging forgery type.
Approach
The authors propose a two-phase open framework consisting of forgery identification and fact retrieval. The first phase determines if an image is forged, identifies the forgery type (e.g., copy-move, image splicing, object removal, colorization), and localizes the tampered regions. The second phase uses both global retrieval (for the entire image) and local retrieval (for detected forgery segments) to find corresponding original images from a large reference set.
Datasets
LookupForensics (their newly constructed dataset), Google's Open Images Dataset, Image Similarity Challenge 2021 (ISC2021), CASIA v1.0, CASIA, CoMoFoD, Carvalho, Columbia, GC, SH, LB.
Model(s)
UNKNOWN
Author countries
Japan, Vietnam, Taiwan