Beyond Detection: Visual Realism Assessment of Deepfakes

Authors: Luka Dragar, Peter Peer, Vitomir Štruc, Borut Batagelj

Published: 2023-06-09 15:53:01+00:00

AI Summary

This paper presents an effective method for assessing the visual realism of DeepFake videos, securing third place in the DFGC on Visual Realism Assessment 2023. The approach utilizes an ensemble of two Convolutional Neural Network (CNN) models, Eva and ConvNext, trained on the DeepFake Game Competition (DFGC) 2022 dataset. These models predict Mean Opinion Scores (MOS) based on features extracted from sequences of video frames.

Abstract

In the era of rapid digitalization and artificial intelligence advancements, the development of DeepFake technology has posed significant security and privacy concerns. This paper presents an effective measure to assess the visual realism of DeepFake videos. We utilize an ensemble of two Convolutional Neural Network (CNN) models: Eva and ConvNext. These models have been trained on the DeepFake Game Competition (DFGC) 2022 dataset and aim to predict Mean Opinion Scores (MOS) from DeepFake videos based on features extracted from sequences of frames. Our method secured the third place in the recent DFGC on Visual Realism Assessment held in conjunction with the 2023 International Joint Conference on Biometrics (IJCB 2023). We provide an over\\-view of the models, data preprocessing, and training procedures. We also report the performance of our models against the competition's baseline model and discuss the implications of our findings.


Key findings
The ensemble model achieved a competitive performance, securing third place in the DFGC-VRA 2023. It demonstrated superior performance compared to the competition's baseline model, with a final score of 0.8545. The Eva model, despite initial lower training phase results, showed enhanced generalization capabilities compared to ConvNext on test sets.
Approach
The authors employ an ensemble of two CNN models, Eva and ConvNext, each equipped with regression heads, to predict Mean Opinion Scores (MOS) for DeepFake videos. The models extract features from sequences of 5 frames, calculating mean and standard deviation vectors, which are then fed into fully connected layers to output the MOS. Predictions from both models are combined using a weighted average.
Datasets
DeepFake Game Competition (DFGC) 2022 dataset
Model(s)
Eva (Vision Transformer), ConvNext (Convolutional Neural Network)
Author countries
Slovenia