Deepfake Detection with Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking

Authors: Xiang Zhang, Wenliang Weng, Daoyong Fu, Ziqiang Li, Zhangjie Fu

Published: 2026-01-03 02:33:18+00:00

AI Summary

The paper introduces MASM (Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking) to improve deepfake detection generalization by explicitly decoupling stable semantic structures from diverse forgery artifact representations. MASM partitions pretrained model weights via SVD into a stable semantic principal subspace and multiple learnable artifact subspaces. This framework uses a selective layer mask strategy and spectral constraints to prevent overfitting and enhance generalization robustness in cross-dataset scenarios.

Abstract

Deepfake detection still faces significant challenges in cross-dataset and real-world complex scenarios. The root cause lies in the high diversity of artifact distributions introduced by different forgery methods, while pretrained models tend to disrupt their original general semantic structures when adapting to new artifacts. Existing approaches usually rely on indiscriminate global parameter updates or introduce additional supervision signals, making it difficult to effectively model diverse forgery artifacts while preserving semantic stability. To address these issues, this paper proposes a deepfake detection method based on Multi-Artifact Subspaces and selective layer masks (MASM), which explicitly decouples semantic representations from artifact representations and constrains the fitting strength of artifact subspaces, thereby improving generalization robustness in cross-dataset scenarios. Specifically, MASM applies singular value decomposition to model weights, partitioning pretrained weights into a stable semantic principal subspace and multiple learnable artifact subspaces. This design enables decoupled modeling of different forgery artifact patterns while preserving the general semantic subspace. On this basis, a selective layer mask strategy is introduced to adaptively regulate the update behavior of corresponding network layers according to the learning state of each artifact subspace, suppressing overfitting to any single forgery characteristic. Furthermore, orthogonality constraints and spectral consistency constraints are imposed to jointly regularize multiple artifact subspaces, guiding them to learn complementary and diverse artifact representations while maintaining a stable overall spectral structure.


Key findings
MASM achieves superior cross-dataset generalization, attaining the best video-level AUC average of 0.92985 across all tested benchmarks, significantly outperforming state-of-the-art methods like Effort and StA. The method also demonstrates stronger and more stable detection performance across various real-world distortions (e.g., compression, noise) compared to competing approaches.
Approach
The method applies Singular Value Decomposition (SVD) to linear layer weights, separating them into a frozen semantic subspace and multiple learnable artifact subspaces that model different forgery patterns. A Selective Layer Mask (SLM), based on the bias-variance ratio of layer gradients, adaptively controls which layers are fine-tuned. Orthogonality and spectral consistency constraints are imposed to regularize the artifact subspaces, ensuring complementary learning and stability.
Datasets
FaceForensics++ (FF++), CelebDF (CDF), DFDC Preview (DFDC-P), Deepfake Detection Challenge dataset (DFDC), DeepfakeDetection (DFD).
Model(s)
CLIP ViT-L/14 (main backbone), BEiT ViT-B/16, BEiT ViT-L/16, CLIP ViT-B/16.
Author countries
China