Aletheia: Physics-Conditioned Localized Artifact Attention (PhyLAA-X) for End-to-End Generalizable and Robust Deepfake Video Detection
Authors: Devendra Ghori
Published: 2026-04-13 08:14:50+00:00
Comment: Code: https://github.com/devghori1264/Aletheia (MIT license). Dataset notes: see Data and Code Availability section
AI Summary
This paper introduces PhyLAA-X, a novel physics-conditioned extension of Localized Artifact Attention (LAA-X) for generalizable and robust deepfake video detection. It integrates end-to-end differentiable physics-derived features, such as optical-flow curl, specular-reflectance skewness, and rPPG power spectra, directly into the LAA-X attention computation via cross-attention gating and a resonance consistency loss. This approach forces the network to learn manipulation boundaries where semantic inconsistencies and physical violations co-occur, making it harder for generative models to replicate consistently and improving performance under distribution shifts, compression, and adversarial attacks.
Abstract
State-of-the-art deepfake detectors achieve near-perfect in-domain accuracy yet degrade under cross-generator shifts, heavy compression, and adversarial perturbations. The core limitation remains the decoupling of semantic artifact learning from physical invariants: optical-flow discontinuities, specular-reflection inconsistencies, and cardiac-modulated reflectance (rPPG) are treated either as post-hoc features or ignored. We introduce PhyLAA-X, a novel physics-conditioned extension of Localized Artifact Attention (LAA-X). PhyLAA-X injects three end-to-end differentiable physics-derived feature volumes - optical-flow curl, specular-reflectance skewness, and spatially-upsampled rPPG power spectra - directly into the LAA-X attention computation via cross-attention gating and a resonance consistency loss. This forces the network to learn manipulation boundaries where semantic inconsistencies and physical violations co-occur - regions inherently harder for generative models to replicate consistently. PhyLAA-X is embedded across an efficient spatiotemporal ensemble (EfficientNet-B4+BiLSTM, ResNeXt-101+Transformer, Xception+causal Conv1D) with uncertainty-aware adaptive weighting. On FaceForensics++ (c23), Aletheia reaches 97.2% accuracy / 0.992 AUC-ROC; on Celeb-DF v2, 94.9% / 0.981; on DFDC, 90.8% / 0.966 - outperforming the strongest published baseline (LAA-Net [1]) by 4.1-7.3% in cross-generator settings and maintaining 79.4% accuracy under epsilon = 0.02 PGD-10 attacks. Single-backbone ablations confirm PhyLAA-X alone delivers a 4.2% cross-dataset AUC gain. The full production system is open-sourced at https://github.com/devghori1264/Aletheia (v1.2, April 2026) with pretrained weights, the adversarial corpus (referred to as ADC-2026 in this work), and complete reproducibility artifacts.