An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

View on arXiv ← Back to list

Authors: Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath

Published: 2024-04-24 21:21:50+00:00

AI Summary

This paper analyzes the generalization and adversarial robustness of eight state-of-the-art deepfake image detectors. It demonstrates that these detectors fail to generalize well against user-customized generative models and are vulnerable to adversarial attacks leveraging vision foundation models, highlighting the need for improved defenses.

Abstract

Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.

Key findings

Existing deepfake detectors show significant performance degradation against user-customized generative models and adversarial attacks using foundation models. Frequency-based features show better generalization, while foundation model-based defenses exhibit more resilience against adversarial attacks. Content-agnostic features and ensemble methods can improve generalization performance.

Approach

The authors retrain eight state-of-the-art deepfake detectors on two datasets, then evaluate their performance against two attack vectors: (1) deepfakes generated by user-customized Stable Diffusion models and (2) adversarial examples crafted using vision foundation models without adding noise. They propose using content-agnostic features and ensemble methods to improve generalization.

Datasets

LAION-AESTHETICS, Flickr-Faces-HQ (FFHQ), StyleGAN2 generated images, and a custom dataset of user-customized Stable Diffusion models.

Model(s)

UnivCLIP, DE-FAKE, DCT, Patch-Forensics, Gram-Net, Resynthesis, CNN-F, MesoNet, EfficientNet, ViT, CLIP-ResNet, OpenCLIP-ConvNext-Large, MM-BSN, and various Stable Diffusion model variants.

Author countries

USA

← Previous