UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization

Authors: Qing Huang, Zhipei Xu, Xuanyu Zhang, Jian Zhang

Published: 2025-10-03 16:33:05+00:00

AI Summary

UniShield is a novel multi-agent framework designed for unified forgery image detection and localization across diverse domains, including image manipulation, document manipulation, DeepFake, and AI-generated images. It integrates a perception agent to dynamically select suitable detection models and a detection agent that consolidates expert detectors to generate interpretable reports. Extensive experiments demonstrate that UniShield achieves state-of-the-art results, outperforming both existing unified approaches and domain-specific detectors due to its superior practicality, adaptiveness, and scalability.

Abstract

With the rapid advancements in image generation, synthetic images have become increasingly realistic, posing significant societal risks, such as misinformation and fraud. Forgery Image Detection and Localization (FIDL) thus emerges as essential for maintaining information integrity and societal security. Despite impressive performances by existing domain-specific detection methods, their practical applicability remains limited, primarily due to their narrow specialization, poor cross-domain generalization, and the absence of an integrated adaptive framework. To address these issues, we propose UniShield, the novel multi-agent-based unified system capable of detecting and localizing image forgeries across diverse domains, including image manipulation, document manipulation, DeepFake, and AI-generated images. UniShield innovatively integrates a perception agent with a detection agent. The perception agent intelligently analyzes image features to dynamically select suitable detection models, while the detection agent consolidates various expert detectors into a unified framework and generates interpretable reports. Extensive experiments show that UniShield achieves state-of-the-art results, surpassing both existing unified approaches and domain-specific detectors, highlighting its superior practicality, adaptiveness, and scalability.


Key findings
UniShield significantly outperforms all baseline methods across the four forgery tasks (IMDL, AIGCD, DFD, DMDL), demonstrating strong cross-domain generalization and detection robustness. The framework's cooperative reasoning mechanism leads to better performance than any single expert model, exhibiting a '1 + 1 > 2' synergy. This highlights its enhanced practical applicability, robustness, and cross-domain adaptability for visual content authentication.
Approach
UniShield solves the problem by employing a multi-agent system consisting of a perception agent and a detection agent. The perception agent analyzes image features to dynamically route the image to the correct forgery domain and then selects the most suitable expert detection model (either LLM-based or non-LLM-based). The detection agent then utilizes the chosen expert model for forgery detection and localization, finally generating a structured and interpretable report.
Datasets
CASIA1+, IMD2020, RTM, AIGCDetectionBenchmark, DF40 (FS, FR)
Model(s)
Qwen2.5-VL (for perception agent and custom DFD/DMDL models), GPT-4o (for report summarization), GLaMM (for localization); Expert detectors include IML-ViT, FakeShield, AscFormer, CLIP, AIDE, FakeVLM, and custom-trained DMDL-R1 and DFD-R1 (fine-tuned Qwen-2.5VL).
Author countries
China