Protecting Against Image Translation Deepfakes by Leaking Universal Perturbations from Black-Box Neural Networks

Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

Published: 2020-06-11 15:02:27+00:00

AI Summary

This work introduces Leaking Universal Perturbations (LUP), an efficient black-box adversarial attack method designed to disrupt image translation deepfake generation systems. LUP significantly reduces the number of queries required for attacks by leveraging information collected during an initial 'leaking phase' to achieve more efficient attacks in an 'exploitation phase'. This novel approach demonstrates the first successful black-box attacks on image translation models, providing a crucial step towards protecting individuals' likeness from unauthorized facial manipulation.

Abstract

In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for image translation systems in the real-world. We present a frustratingly simple yet highly effective algorithm Leaking Universal Perturbations (LUP), that significantly reduces the number of queries needed to attack an image. LUP consists of two phases: (1) a short leaking phase where we attack the network using traditional black-box attacks and gather information on successful attacks on a small dataset and (2) and an exploitation phase where we leverage said information to subsequently attack the network with improved efficiency. Our attack reduces the total number of queries necessary to attack GANimation and StarGAN by 30%.


Key findings
LUP significantly reduces the average number of queries required for successful black-box attacks on image translation networks, achieving approximately a 30% reduction compared to the next best method (IT-SimBA) on both GANimation and StarGAN. This enhanced efficiency makes it more feasible to scale protection mechanisms to a large number of images, thereby strengthening real-world defenses against deepfake generation.
Approach
The Leaking Universal Perturbations (LUP) algorithm consists of two phases. First, a 'leaking phase' performs traditional black-box attacks on a small auxiliary dataset to gather successful perturbations, from which principal components are extracted using PCA. Second, an 'exploitation phase' uses these leaked PCA components as efficient attack vectors with a modified Simple Black-box Attack (SimBA) to disrupt image translation on a larger test set, minimizing query count.
Datasets
CelebA dataset
Model(s)
UNKNOWN
Author countries
USA