An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

Authors: Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath

Published: 2024-04-24 21:21:50+00:00

Comment: Accepted to IEEE S&P 2024; 19 pages, 10 figures

AI Summary

This paper critically analyzes 8 state-of-the-art deepfake image detectors, revealing their significant performance degradation against deepfakes generated by user-customized generative models and adversarial deepfakes crafted using vision foundation models. It highlights the shortcomings of current evaluation methodologies and proposes strategies such as ensemble modeling, content-agnostic features, and leveraging more powerful foundation models or adversarial training to enhance generalization and robustness. The work underscores the urgent need to rethink deepfake defenses in an evolving threat landscape.

Abstract

Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \\emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \\textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \\textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.


Key findings
All 8 state-of-the-art deepfake detectors exhibit significant performance degradation (up to 53.92% Recall drop) against deepfakes from user-customized generative models, with CNN-based defenses being the most vulnerable. These detectors are also highly susceptible to adversarial deepfakes crafted using foundation models via semantic manipulation, showing up to 88.35% Recall degradation. While frequency-based features generalize well, they are weakest against adversarial attacks, conversely, defenses leveraging more powerful foundation models or adversarial training demonstrate improved resilience.
Approach
The authors evaluate 8 state-of-the-art deepfake image detectors against two novel threat vectors: images from 16 user-customized Stable Diffusion models and adversarial deepfakes generated by leveraging vision foundation models for semantic content manipulation. They propose improving generalization by incorporating content-agnostic features and ensemble methods, and suggest using more powerful foundation models or adversarial training for enhanced adversarial robustness.
Datasets
SD dataset (Real: LAION-AESTHETICS; Fake: Realistic Vision v1.4 SD model), StyleCLIP dataset (Real: Flickr-Faces-HQ (FFHQ); Fake: StyleGAN2 generated images), LAION-400M, LDM, GLIDE.
Model(s)
Deepfake Detectors: UnivCLIP (CLIP:ViT-L/14), DE-FAKE (CLIP, BLIP), DCT (Logistic Regression), Patch-Forensics (truncated CNN), Gram-Net (CNN-based), Resynthesis, CNN-F (ResNet-50), MesoNet (DNN with Inception modules), UnivConv2B (OpenCLIP-ConvNext-Large). Generative Models: Stable Diffusion (Realistic Vision v1.4 SD model, SDv1.5), StyleCLIP (StyleGAN2). Foundation Models (for attacks/improved defenses): EfficientNet, ViT, CLIP-ResNet, OpenCLIP-ConvNext-Large.
Author countries
USA