SELFI: Selective Fusion of Identity for Generalizable Deepfake Detection

Authors: Younghun Kim, Minsuk Jang, Myung-Joon Kwon, Wonjun Lee, Changick Kim

Published: 2025-06-21 05:11:35+00:00

AI Summary

This paper introduces SELFI (SELective Fusion of Identity), a framework for generalizable deepfake detection that adaptively integrates face identity features. It resolves conflicting views on identity's role by dynamically modulating its usage based on per-sample relevance, leveraging it when beneficial and suppressing it when harmful. This explicit and adaptive control significantly improves cross-manipulation generalization, outperforming state-of-the-art methods on challenging benchmarks.

Abstract

Face identity provides a powerful signal for deepfake detection. Prior studies show that even when not explicitly modeled, classifiers often learn identity features implicitly. This has led to conflicting views: some suppress identity cues to reduce bias, while others rely on them as forensic evidence. To reconcile these views, we analyze two hypotheses: (1) whether face identity alone is discriminative for detecting deepfakes, and (2) whether such identity features generalize poorly across manipulation methods. Our experiments confirm that identity is informative but context-dependent. While some manipulations preserve identity-consistent artifacts, others distort identity cues and harm generalization. We argue that identity features should neither be blindly suppressed nor relied upon, but instead be explicitly modeled and adaptively controlled based on per-sample relevance. We propose \\textbf{SELFI} (\\textbf{SEL}ective \\textbf{F}usion of \\textbf{I}dentity), a generalizable detection framework that dynamically modulates identity usage. SELFI consists of: (1) a Forgery-Aware Identity Adapter (FAIA) that extracts identity embeddings from a frozen face recognition model and projects them into a forgery-relevant space via auxiliary supervision; and (2) an Identity-Aware Fusion Module (IAFM) that selectively integrates identity and visual features using a relevance-guided fusion mechanism. Experiments on four benchmarks show that SELFI improves cross-manipulation generalization, outperforming prior methods by an average of 3.1\\% AUC. On the challenging DFDC dataset, SELFI exceeds the previous best by 6\\%. Code will be released upon paper acceptance.


Key findings
SELFI improves cross-manipulation generalization by an average of 3.1% frame-level AUC in cross-dataset evaluations, notably achieving a 6% improvement over the previous best on the DFDC dataset. The adaptive fusion mechanism, rather than simple feature ensembling, is confirmed to be the source of performance gains, and SELFI consistently enhances performance across diverse backbone architectures like CLIP, ResNet34, and EfficientNet-B4.
Approach
SELFI solves the problem by explicitly modeling and adaptively controlling face identity features. It uses a Forgery-Aware Identity Adapter (FAIA) to extract and project identity embeddings into a forgery-relevant space with auxiliary supervision, and an Identity-Aware Fusion Module (IAFM) that selectively integrates these identity features with visual features based on a learned per-sample relevance score.
Datasets
FaceForensics++ (FF++), Celeb-DF v2 (CDFv2), DeepfakeDetection (DFD), Deepfake Detection Challenge (DFDC), Deepfake Detection Challenge Preview (DFDCP)
Model(s)
CLIP (as visual backbone), ResNet34, EfficientNet-B4, IResNet100 (frozen face recognition model), Forgery-Aware Identity Adapter (FAIA), Identity-Aware Fusion Module (IAFM), lightweight classifiers.
Author countries
South Korea