Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
Authors: Noor Fatima, Hasan Faraz Khan, Muzammil Behzad
Published: 2025-12-26 04:05:52+00:00
AI Summary
This work introduces an attack-aware image deepfake and forensics detector designed for robustness, reliability, and transparent evidence generation under realistic counter-forensic manipulations. The method utilizes a two-stream architecture combining semantic content and forensic residuals, trained using a red-team strategy that applies the worst-of-K attacks per batch. Robustness is further enhanced by randomized test-time defense and a shallow FPN head that produces weakly supervised tamper heatmaps concentrated in face regions.
Abstract
This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic residuals, fused via a lightweight residual adapter for classification, while a shallow Feature Pyramid Network style head produces tamper heatmaps under weak supervision. Red-team training applies worst-of-K counter-forensics per batch, including JPEG realign and recompress, resampling warps, denoise-to-regrain operations, seam smoothing, small color and gamma shifts, and social-app transcodes, while test-time defense injects low-cost jitters such as resize and crop phase changes, mild gamma variation, and JPEG phase shifts with aggregated predictions. Heatmaps are guided to concentrate within face regions using face-box masks without strict pixel-level annotations. Evaluation on existing benchmarks, including standard deepfake datasets and a surveillance-style split with low light and heavy compression, reports clean and attacked performance, AUC, worst-case accuracy, reliability, abstention quality, and weak-localization scores. Results demonstrate near-perfect ranking across attacks, low calibration error, minimal abstention risk, and controlled degradation under regrain, establishing a modular, data-efficient, and practically deployable baseline for attack-aware detection with calibrated probabilities and actionable heatmaps.