On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models

Authors: Marija Ivanovska, Vitomir Štruc

Published: 2023-07-11 15:57:51+00:00

AI Summary

This paper investigates the vulnerability of deepfake detectors to attacks generated by Denoising Diffusion Models (DDMs). The authors show that a single denoising diffusion step can significantly reduce the detection rate without perceptible image changes, highlighting the need for more robust detection methods.

Abstract

The detection of malicious deepfakes is a constantly evolving problem that requires continuous monitoring of detectors to ensure they can detect image manipulations generated by the latest emerging models. In this paper, we investigate the vulnerability of single-image deepfake detectors to black-box attacks created by the newest generation of generative methods, namely Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a widely used deepfake benchmark consisting of manipulated images generated with various techniques for face identity swapping and face reenactment. Attacks are crafted through guided reconstruction of existing deepfakes with a proposed DDM approach for face restoration. Our findings indicate that employing just a single denoising diffusion step in the reconstruction process of a deepfake can significantly reduce the likelihood of detection, all without introducing any perceptible image modifications. While training detectors using attack examples demonstrated some effectiveness, it was observed that discriminators trained on fully diffusion-based deepfakes exhibited limited generalizability when presented with our attacks.


Key findings
Even a single denoising diffusion step significantly reduces deepfake detection accuracy. Self-supervised detectors are more robust than supervised ones. Training detectors on attack examples improves detection but lacks generalizability to attacks generated with different numbers of denoising steps.
Approach
The researchers crafted black-box attacks by using a pre-trained DDM to reconstruct existing deepfakes with varying numbers of denoising steps. This process subtly alters the deepfakes, making them harder to detect by existing deepfake detection models. The effectiveness of training detectors on these attacks was also evaluated.
Datasets
FaceForensics++ (FF++) dataset, including Deepfakes, FaceSwap, FaceShifter, InsightFace, Face2Face, and NeuralTextures deepfakes.
Model(s)
Xception, MesoInception, Capsule, RECCE, F3-Net, SRM, DSP-FWA, Face X-Ray, Self Blended Images (SBI), and a pre-trained DDM (architecture based on Dhariwal et al., 2021) with SwinIR as the backbone.
Author countries
Slovenia