Combating Digitally Altered Images: Deepfake Detection

Authors: Saksham Kumar, Rhythm Narang

Published: 2025-08-23 09:59:03+00:00

AI Summary

This research proposes a robust deepfake image detection method using a modified Vision Transformer (ViT) model. The model, trained on a subset of the OpenForensics dataset with augmentation techniques, achieves state-of-the-art results in distinguishing real and deepfake images.

Abstract

The rise of Deepfake technology to generate hyper-realistic manipulated images and videos poses a significant challenge to the public and relevant authorities. This study presents a robust Deepfake detection based on a modified Vision Transformer(ViT) model, trained to distinguish between real and Deepfake images. The model has been trained on a subset of the OpenForensics Dataset with multiple augmentation techniques to increase robustness for diverse image manipulations. The class imbalance issues are handled by oversampling and a train-validation split of the dataset in a stratified manner. Performance is evaluated using the accuracy metric on the training and testing datasets, followed by a prediction score on a random image of people, irrespective of their realness. The model demonstrates state-of-the-art results on the test dataset to meticulously detect Deepfake images.


Key findings
The modified ViT model achieved an accuracy of >99% on the test dataset. The model demonstrated efficient processing speed. Minimal validation loss indicated effective model training and optimization.
Approach
The authors fine-tuned a pre-trained Google ViT-base-patch16-224-in21k model on a subset of the OpenForensics dataset. They employed data augmentation and addressed class imbalance through oversampling. The model's performance was evaluated using accuracy metrics.
Datasets
OpenForensics Dataset
Model(s)
Modified Vision Transformer (ViT) - Google ViT-base-patch16-224-in21k
Author countries
India, India