On Improving Cross-dataset Generalization of Deepfake Detectors

Authors: Aakash Varma Nadimpalli, Ajita Rattani

Published: 2022-04-08 20:34:53+00:00

Comment: 2022 Conference on Computer Vision and Pattern Recognition Workshops | New Orleans, Louisiana

AI Summary

This paper addresses the significant performance degradation of deepfake detectors when evaluated across different datasets by proposing a hybrid supervised and reinforcement learning (RL) approach. An RL agent learns to select the top-k optimal augmentations for each test sample in an image-specific manner. The classification scores from these augmentations are then averaged by a CNN for final deepfake detection, demonstrating superior cross-dataset generalization.

Abstract

Facial manipulation by deep fake has caused major security risks and raised severe societal concerns. As a countermeasure, a number of deep fake detection methods have been proposed recently. Most of them model deep fake detection as a binary classification problem using a backbone convolutional neural network (CNN) architecture pretrained for the task. These CNN-based methods have demonstrated very high efficacy in deep fake detection with the Area under the Curve (AUC) as high as 0.99. However, the performance of these methods degrades significantly when evaluated across datasets. In this paper, we formulate deep fake detection as a hybrid combination of supervised and reinforcement learning (RL) to improve its cross-dataset generalization performance. The proposed method chooses the top-k augmentations for each test sample by an RL agent in an image-specific manner. The classification scores, obtained using CNN, of all the augmentations of each test image are averaged together for final real or fake classification. Through extensive experimental validation, we demonstrate the superiority of our method over existing published research in cross-dataset generalization of deep fake detectors, thus obtaining state-of-the-art performance.


Key findings
The proposed hybrid supervised and RL approach significantly improved cross-dataset generalization of deepfake detectors, outperforming baseline CNNs and random test-time augmentations. EfficientNet V2-L combined with a PPO-based RL agent achieved state-of-the-art results in cross-dataset evaluation, notably on Celeb-DF. The study also found that selecting the top-3 augmentations provided optimal performance for the method.
Approach
The method integrates a pre-trained convolutional neural network (CNN) with a reinforcement learning (RL) agent. The RL agent (using PPO or DQN) learns a policy to select the top-k optimal data augmentations for each test image, based on the CNN's feature map and loss. The CNN then classifies these augmented images, and their scores are averaged to make the final real/fake decision.
Datasets
FaceForensics++, Celeb-DF, DeeperForensics-1.0
Model(s)
UNKNOWN
Author countries
USA