Two-branch Recurrent Network for Isolating Deepfakes in Videos

Authors: Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, Wael AbdAlmageed

Published: 2020-08-08 01:38:56+00:00

Comment: To appear in the 16th European Conference on Computer Vision ECCV 2020 (added link to our demo and to the video presentation)

AI Summary

This paper presents a two-branch recurrent network for video-based deepfake detection, designed to isolate manipulated faces by amplifying artifacts while suppressing high-level face content. It employs one branch for original information and another with a Laplacian of Gaussian bottleneck to amplify multi-band frequencies. A novel cost function compresses natural face representations and pushes away manipulated ones, achieving promising results on FaceForensics++, Celeb-DF, and DFDC preview benchmarks.

Abstract

The current spike of hyper-realistic faces artificially generated using deepfakes calls for media forensics solutions that are tailored to video streams and work reliably with a low false alarm rate at the video level. We present a method for deepfake detection based on a two-branch network structure that isolates digitally manipulated faces by learning to amplify artifacts while suppressing the high-level face content. Unlike current methods that extract spatial frequencies as a preprocessing step, we propose a two-branch structure: one branch propagates the original information, while the other branch suppresses the face content yet amplifies multi-band frequencies using a Laplacian of Gaussian (LoG) as a bottleneck layer. To better isolate manipulated faces, we derive a novel cost function that, unlike regular classification, compresses the variability of natural faces and pushes away the unrealistic facial samples in the feature space. Our two novel components show promising results on the FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks, when compared to prior work. We then offer a full, detailed ablation study of our network architecture and cost function. Finally, although the bar is still high to get very remarkable figures at a very low false alarm rate, our study shows that we can achieve good video-level performance when cross-testing in terms of video-level AUC.


Key findings
The proposed method achieves superior performance on FaceForensics++ and Celeb-DF, significantly improving video-level AUC compared to state-of-the-art methods like XceptionNet. It demonstrates good generalization across datasets, yielding a substantial boost in log-weighted precision at high recall rates on the challenging DFDC preview dataset, although precision at very low recall may be slightly lower than some baselines.
Approach
The method uses a two-branch network based on DenseBlocks: one processes RGB information, and the other uses a Laplacian of Gaussian (LoG) bottleneck to amplify frequency artifacts. The fused features are then fed into a bi-directional LSTM for temporal modeling of video sequences. A novel loss function optimizes the feature space by compacting natural face representations around a centroid and pushing manipulated faces away for better separation.
Datasets
FaceForensics++, Celeb-DF, The Deepfake Detection Challenge (DFDC) Preview Dataset
Model(s)
DenseNet (DenseBlocks), Bi-directional Long Short-Term Memory (LSTM), custom Laplacian of Gaussian (LoG) layer
Author countries
USA