Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Authors: Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo

Published: 2026-03-11 07:58:38+00:00

AI Summary

This paper demonstrates that the naive exposure of reasoning and image refinement capabilities in generative AI (GAI) systems fundamentally undermines modern deepfake detectors. It studies a realistic scenario where adversaries use benign prompts and commercial GAI to semantically refine deepfake images, causing state-of-the-art detectors to fail while preserving identity and enhancing perceptual quality. The findings reveal a structural mismatch between current deepfake detection threat models and the actual capabilities of real-world GAI, highlighting that commercial chatbot services pose a significant security risk.

Abstract

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by commercial face recognition APIs, and exhibit substantially higher perceptual quality. Importantly, we find that widely accessible commercial chatbot services pose a significantly greater security risk than open-source models, as their superior realism, semantic controllability, and low-barrier interfaces enable effective evasion by non-expert users. Our findings reveal a structural mismatch between the threat models assumed by current detection frameworks and the actual capabilities of real-world generative AI. While detection baselines are largely shaped by prior benchmarks, deployed systems expose unrestricted authenticity reasoning and refinement despite stringent safety controls in other domains.


Key findings
Semantic-preserving image refinement by commercial GAI systems causes state-of-the-art deepfake detectors to largely fail, dropping detection rates to near-zero in many cases, while maintaining subject identity and improving visual quality. The study reveals a structural split: traditional deepfake detectors fail, while AI-generated image detectors, though sometimes responding to new synthesis signatures, also show vulnerability under strict thresholds for commercial GAI outputs. Commercial GAI services, particularly Gemini, present a higher security risk than open-source models due to their superior refinement capabilities and the ease with which non-expert users can achieve evasion through adaptive, reasoning-guided prompts.
Approach
The authors study a realistic scenario where an adversary uses benign, policy-compliant prompts with commercial GAI systems. They demonstrate that GAIs articulate authenticity criteria, identify visual artifacts in deepfake images, and then refine these images using the system's own feedback as refinement objectives. This process creates images that evade detection, preserve identity verified by commercial face recognition APIs, and exhibit higher perceptual quality.
Datasets
FaceForensics++, Flickr-Faces-HQ (FFHQ), Tiny-GenImage
Model(s)
UNKNOWN
Author countries
South Korea