The Alpha Blending Hypothesis: Compositing Shortcut in Deepfake Detection

Authors: Andrii Yermakov, Jan Cech, Mario Fritz, Jiri Matas

Published: 2026-05-11 10:35:39+00:00

AI Summary

This paper introduces the Alpha Blending Hypothesis, positing that state-of-the-art frame-based deepfake detectors primarily detect low-level compositing artifacts introduced by alpha blending. The authors propose BlenD, a deepfake detection method trained exclusively on diverse real facial images augmented with self-blended images (SBI), which achieves superior cross-dataset generalization on compositional deepfake datasets. The study further demonstrates that ensembling BlenD with models less susceptible to blending shortcuts yields state-of-the-art detection performance.

Abstract

Recent deepfake detection methods demonstrate improved cross-dataset generalization, yet the underlying mechanisms remain underexplored. We introduce the Alpha Blending Hypothesis, positing that state-of-the-art frame-based detectors primarily function as alpha blending searchers; rather than learning semantic anomalies or specific generative neural fingerprints, they localize low-level compositing artifacts introduced during the integration of manipulated faces into target frames. We experimentally validate the hypothesis, demonstrating that deepfake detectors exhibit high sensitivity to the so-called self-blended images (SBI) and non-generative manipulations. We propose the method BlenD that leverages a large-scale, diverse dataset of real-only facial images augmented with SBI. This approach achieves the best average cross-dataset generalization on 15 compositional deepfake datasets released between 2019 and 2025 without utilizing explicitly generated deepfakes during training. Furthermore, we show that predictions from explicit blending searchers and models resilient to blending shortcuts are highly complementary, yielding a state-of-the-art AUROC of 94.0% in an ensemble configuration. The code with experiments and the trained model will be publicly released.


Key findings
The study validated the Alpha Blending Hypothesis, showing that SOTA deepfake detectors are highly sensitive to low-level blending artifacts and non-generative manipulations. BlenD, trained without real deepfakes, achieved the best average cross-dataset generalization (91.3% AUROC) on 15 compositional deepfake datasets. An ensemble of BlenD with models resilient to blending shortcuts yielded a state-of-the-art AUROC of 94.0%, highlighting the complementarity of different detection cues, but also revealing limitations on fully synthetic content.
Approach
The core approach involves the Alpha Blending Hypothesis, which suggests deepfake detectors exploit low-level alpha blending artifacts. Their method, BlenD, leverages a pre-trained Vision Foundation Model (PEcoreL, CLIP, or DINOv3) fine-tuned on a large-scale, diverse dataset of real-only facial images from ScaleDF, augmented with pseudo-fakes generated using the Self-Blended Images (SBI) technique, without any explicitly generated deepfakes.
Datasets
ScaleDF (real-only subset for BlenD training), Self-Blended Images (SBI) (for pseudo-fake generation), FaceForensics++ (FF++), Celeb-DF-v2 (CDFv2), Celeb-DF++ (CDFv3), DeepFake Detection Challenge (DFDC), Face Forensics in the Wild (FFIW), Google’s DFD dataset (DFD), DeepSpeak v1.1 (DSv1), DeepSpeak v2.0 (DSv2), FakeAVCeleb (FAVC), Korean DeepFake Detection Dataset (KoDF), DeepFakes from Different Models (DFDM), PolyGlotFake (PGF), IDForge (IDF), RedFace (RF), FaceShifter (FSh). CDFv3, FFIW, DSv1, DSv2 were also used for validation.
Model(s)
BlenD utilizes Vision Foundation Models such as PEcoreL (default), CLIP ViT-L/14, and DINOv3 ViT-L/16 as backbones. Comparative analyses were conducted against state-of-the-art detectors including Effort, ForAda, FS-VFM, GenD (with PEcoreL, DINO, and CLIP backbones), SBI, and FSBI.
Author countries
Czech Republic, Germany