Exposing Deepfake with Pixel-wise AR and PPG Correlation from Faint Signals

Authors: Maoyu Mao, Jun Yang

Published: 2021-10-29 06:05:52+00:00

AI Summary

This paper proposes a Deepfake detection scheme that exploits faint physiological and spatial signals hidden in face videos. It extracts photoplethysmography (PPG) features to capture temporal heart rate fluctuations and auto-regressive (AR) coefficients to reflect inter-pixel correlations from up-sampling artifacts. These features are then classified by an ACBlock-based improved DenseNet to enhance detection accuracy and generalization.

Abstract

Deepfake poses a serious threat to the reliability of judicial evidence and intellectual property protection. In spite of an urgent need for Deepfake identification, existing pixel-level detection methods are increasingly unable to resist the growing realism of fake videos and lack generalization. In this paper, we propose a scheme to expose Deepfake through faint signals hidden in face videos. This scheme extracts two types of minute information hidden between face pixels-photoplethysmography (PPG) features and auto-regressive (AR) features, which are used as the basis for forensics in the temporal and spatial domains, respectively. According to the principle of PPG, tracking the absorption of light by blood cells allows remote estimation of the temporal domains heart rate (HR) of face video, and irregular HR fluctuations can be seen as traces of tampering. On the other hand, AR coefficients are able to reflect the inter-pixel correlation, and can also reflect the traces of smoothing caused by up-sampling in the process of generating fake faces. Furthermore, the scheme combines asymmetric convolution block (ACBlock)-based improved densely connected networks (DenseNets) to achieve face video authenticity forensics. Its asymmetric convolutional structure enhances the robustness of network to the input feature image upside-down and left-right flipping, so that the sequence of feature stitching does not affect detection results. Simulation results show that our proposed scheme provides more accurate authenticity detection results on multiple deep forgery datasets and has better generalization compared to the benchmark strategy.


Key findings
The proposed scheme significantly improves Deepfake detection accuracy, achieving 96.13% on FaceForensics++ compared to 70.47% for basic neural networks and 94.65% for a baseline weak signal method. It demonstrates good generalization, with 86.57% accuracy on Celeb-DF when trained on FaceForensics++. The combination of PPG and AR features, coupled with the robust ACNet classifier, effectively captures temporal and spatial inconsistencies in deepfake videos.
Approach
The scheme extracts two types of 'faint signals' from face videos: Photoplethysmography (PPG) features for temporal domain (heart rate) and Auto-Regressive (AR) features for spatial domain (inter-pixel correlation). These features are then fed into an improved Densely Connected Network (DenseNet) enhanced with Asymmetric Convolution Blocks (ACBlock) for authenticity classification.
Datasets
FaceForensics++, Celeb-DF
Model(s)
ACBlock-based improved Densely Connected Networks (DenseNets), specifically DenseNet121 with ACBlocks.
Author countries
China