Visual Language Models as Zero-Shot Deepfake Detectors

Authors: Viacheslav Pirogov

Published: 2025-07-30 08:20:02+00:00

AI Summary

This paper proposes a novel deepfake detection approach using Visual Language Models (VLMs) in a zero-shot setting. The method leverages the zero-shot capabilities of VLMs for image classification, achieving superior performance compared to existing methods on a new high-quality deepfake dataset and competitive results on a standard benchmark dataset.

Abstract

The contemporary phenomenon of deepfakes, utilizing GAN or diffusion models for face swapping, presents a substantial and evolving threat in digital media, identity verification, and a multitude of other systems. The majority of existing methods for detecting deepfakes rely on training specialized classifiers to distinguish between genuine and manipulated images, focusing only on the image domain without incorporating any auxiliary tasks that could enhance robustness. In this paper, inspired by the zero-shot capabilities of Vision Language Models, we propose a novel VLM-based approach to image classification and then evaluate it for deepfake detection. Specifically, we utilize a new high-quality deepfake dataset comprising 60,000 images, on which our zero-shot models demonstrate superior performance to almost all existing methods. Subsequently, we compare the performance of the best-performing architecture, InstructBLIP, on the popular deepfake dataset DFDC-P against traditional methods in two scenarios: zero-shot and in-domain fine-tuning. Our results demonstrate the superiority of VLMs over traditional classifiers.


Key findings
VLMs significantly outperform traditional deepfake detectors in zero-shot scenarios on a new, unseen dataset. InstructBLIP achieves near-perfect performance on the DFDC-P dataset with minimal fine-tuning. The proposed probabilistic classification method improves accuracy over simpler binary approaches.
Approach
The authors employ a probabilistic reformulation of the VLM's output, using the probability distribution over generated answers (instead of just the argmax) to classify an image as real or fake. This allows for the calculation of confidence scores, improving upon binary classification approaches.
Datasets
A new high-quality deepfake dataset comprising 60,000 images (30,000 real and 30,000 fake) generated using SimSwap on CelebA-HQ; DFDC-P dataset.
Model(s)
InstructBLIP, Idefics2, LLaVA-1.6, GPT-4o; Several existing deepfake detection models are also used as baselines for comparison (FF, MAT, M2TR, RECCE, CADDM, SBI).
Author countries
Germany