Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection
Authors: Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
Published: 2026-03-11 07:58:38+00:00
AI Summary
This paper demonstrates that the naive exposure of reasoning and image refinement capabilities in generative AI (GAI) systems fundamentally undermines modern deepfake detectors. It studies a realistic scenario where adversaries use benign prompts and commercial GAI to semantically refine deepfake images, causing state-of-the-art detectors to fail while preserving identity and enhancing perceptual quality. The findings reveal a structural mismatch between current deepfake detection threat models and the actual capabilities of real-world GAI, highlighting that commercial chatbot services pose a significant security risk.
Abstract
Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.