Aletheia: Physics-Conditioned Localized Artifact Attention (PhyLAA-X) for End-to-End Generalizable and Robust Deepfake Video Detection

Authors: Devendra Ghori

Published: 2026-04-13 08:14:50+00:00

Comment: Code: https://github.com/devghori1264/Aletheia (MIT license). Dataset notes: see Data and Code Availability section

AI Summary

This paper introduces PhyLAA-X, a novel physics-conditioned extension of Localized Artifact Attention (LAA-X) for generalizable and robust deepfake video detection. It integrates end-to-end differentiable physics-derived features, such as optical-flow curl, specular-reflectance skewness, and rPPG power spectra, directly into the LAA-X attention computation via cross-attention gating and a resonance consistency loss. This approach forces the network to learn manipulation boundaries where semantic inconsistencies and physical violations co-occur, making it harder for generative models to replicate consistently and improving performance under distribution shifts, compression, and adversarial attacks.

Abstract

State-of-the-art deepfake detectors achieve near-perfect in-domain accuracy yet degrade under cross-generator shifts, heavy compression, and adversarial perturbations. The core limitation remains the decoupling of semantic artifact learning from physical invariants: optical-flow discontinuities, specular-reflection inconsistencies, and cardiac-modulated reflectance (rPPG) are treated either as post-hoc features or ignored. We introduce PhyLAA-X, a novel physics-conditioned extension of Localized Artifact Attention (LAA-X). PhyLAA-X injects three end-to-end differentiable physics-derived feature volumes - optical-flow curl, specular-reflectance skewness, and spatially-upsampled rPPG power spectra - directly into the LAA-X attention computation via cross-attention gating and a resonance consistency loss. This forces the network to learn manipulation boundaries where semantic inconsistencies and physical violations co-occur - regions inherently harder for generative models to replicate consistently. PhyLAA-X is embedded across an efficient spatiotemporal ensemble (EfficientNet-B4+BiLSTM, ResNeXt-101+Transformer, Xception+causal Conv1D) with uncertainty-aware adaptive weighting. On FaceForensics++ (c23), Aletheia reaches 97.2% accuracy / 0.992 AUC-ROC; on Celeb-DF v2, 94.9% / 0.981; on DFDC, 90.8% / 0.966 - outperforming the strongest published baseline (LAA-Net [1]) by 4.1-7.3% in cross-generator settings and maintaining 79.4% accuracy under epsilon = 0.02 PGD-10 attacks. Single-backbone ablations confirm PhyLAA-X alone delivers a 4.2% cross-dataset AUC gain. The full production system is open-sourced at https://github.com/devghori1264/Aletheia (v1.2, April 2026) with pretrained weights, the adversarial corpus (referred to as ADC-2026 in this work), and complete reproducibility artifacts.

Key findings

Aletheia, with PhyLAA-X, outperforms the strongest published baseline (LAA-Net) by 4.1-7.3% in cross-generator settings, achieving 97.2% accuracy / 0.992 AUC-ROC on FaceForensics++ (c23) and 94.9% / 0.981 on Celeb-DF v2. The system demonstrates strong adversarial robustness, maintaining 79.4% accuracy under epsilon = 0.02 PGD-10 attacks. Single-backbone ablations confirm that PhyLAA-X alone delivers a 4.2% cross-dataset AUC gain.

Approach

The approach integrates three physics-derived feature volumes (optical-flow curl, specular-reflectance skewness, and rPPG power spectra) into the LAA-X attention mechanism using cross-attention gating and a resonance consistency loss. This physics-conditioned attention, PhyLAA-X, is embedded across an efficient spatiotemporal ensemble consisting of EfficientNet-B4+BiLSTM, ResNeXt-101+Transformer, and Xception+causal Conv1D. The outputs of these backbones are then fused using an uncertainty-aware adaptive weighting scheme.

Datasets

FaceForensics++ (c0-c40), Celeb-DF v2, DFDC, DeeperForensics, WildDeepfake, ADC-2026 (adversarial corpus constructed from commercial DeepFake Videos Dataset from Unidata).

Model(s)

EfficientNet-B4, BiLSTM, ResNeXt-101, Transformer, XceptionNet, causal Conv1D, Localized Artifact Attention (LAA-X), PhyLAA-X.

Author countries

UNKNOWN

← Previous