Investigating the Impact of Pre-processing and Prediction Aggregation on the DeepFake Detection Task

View on arXiv ← Back to list

Authors: Polychronis Charitidis, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Kompatsiaris

Published: 2020-06-12 11:16:02+00:00

AI Summary

This paper investigates the impact of pre-processing and prediction aggregation on deepfake detection. It proposes a novel pre-processing step to improve training data quality by removing false face detections and evaluates several video-level prediction aggregation approaches.

Abstract

Recent advances in content generation technologies (widely known as DeepFakes) along with the online proliferation of manipulated media content render the detection of such manipulations a task of increasing importance. Even though there are many DeepFake detection methods, only a few focus on the impact of dataset preprocessing and the aggregation of frame-level to video-level prediction on model performance. In this paper, we propose a pre-processing step to improve the training data quality and examine its effect on the performance of DeepFake detection. We also propose and evaluate the effect of video-level prediction aggregation approaches. Experimental results show that the proposed pre-processing approach leads to considerable improvements in the performance of detection models, and the proposed prediction aggregation scheme further boosts the detection efficiency in cases where there are multiple faces in a video.

Key findings

The proposed pre-processing significantly improves the performance of deepfake detection models across multiple datasets. The face-based prediction aggregation method further enhances performance, especially when videos contain multiple faces. The study highlights the limitations of current models in generalizing to unseen manipulations.

Approach

The authors improve deepfake detection by proposing a pre-processing step that uses face embeddings and connected components to remove false face detections from the training data. They also evaluate different video-level prediction aggregation methods, including averaging, median, maximum, and a face-based approach.

Datasets

DFDC, Celeb-DF, FaceForensics++

Model(s)

MesoInception-4, XceptionNet, EfficientNet-B4

Author countries

Greece

← Previous