UniShield: An Adaptive Multi-Agent Framework for Unified Forgery Image Detection and Localization

View on arXiv ← Back to list

Authors: Qing Huang, Zhipei Xu, Xuanyu Zhang, Jian Zhang

Published: 2025-10-03 16:33:05+00:00

AI Summary

UniShield is a novel multi-agent framework for unified Forgery Image Detection and Localization (FIDL) across diverse domains, including image manipulation, DeepFake, and AI-generated content. It addresses poor cross-domain generalization by implementing an adaptive system that dynamically selects the most suitable expert detection models. Extensive experiments show that UniShield achieves state-of-the-art results, enhancing practicality, adaptiveness, and scalability.

Abstract

With the rapid advancements in image generation, synthetic images have become increasingly realistic, posing significant societal risks, such as misinformation and fraud. Forgery Image Detection and Localization (FIDL) thus emerges as essential for maintaining information integrity and societal security. Despite impressive performances by existing domain-specific detection methods, their practical applicability remains limited, primarily due to their narrow specialization, poor cross-domain generalization, and the absence of an integrated adaptive framework. To address these issues, we propose UniShield, the novel multi-agent-based unified system capable of detecting and localizing image forgeries across diverse domains, including image manipulation, document manipulation, DeepFake, and AI-generated images. UniShield innovatively integrates a perception agent with a detection agent. The perception agent intelligently analyzes image features to dynamically select suitable detection models, while the detection agent consolidates various expert detectors into a unified framework and generates interpretable reports. Extensive experiments show that UniShield achieves state-of-the-art results, surpassing both existing unified approaches and domain-specific detectors, highlighting its superior practicality, adaptiveness, and scalability.

Key findings

UniShield achieves state-of-the-art performance across all four tested domains (IMDL, DFD, AIGCD, DMDL), demonstrating superior robustness compared to existing methods. The system effectively integrates diverse expert tools, exhibiting a '1 + 1 > 2 synergy,' validating the effectiveness of the dynamic tool scheduling mechanism.

Approach

The framework utilizes two collaborative agents: a perception agent and a detection agent. The perception agent employs a task router (fine-tuned via GRPO) to determine the forgery domain and a tool scheduler (based on Qwen2.5-VL) to dynamically select between LLM-based or non-LLM-based expert detectors. The detection agent executes the selected expert model from the toolbox and generates a structured, interpretable report, often powered by GPT-4o for summarization.

Datasets

CASIA1+, IMD2020, RTM, AIGCDetectionBenchmark, DF40

Model(s)

Multi-Agent Framework, Qwen2.5-VL, GPT-4o, GRPO (Group Relative Policy Optimization), integrated expert models (e.g., IML-ViT, FakeShield, CLIP, DFD-R1, DMDL-R1)

Author countries

China

← Previous