Attack-Aware Deepfake Detection under Counter-Forensic Manipulations

Authors: Noor Fatima, Hasan Faraz Khan, Muzammil Behzad

Published: 2025-12-26 04:05:52+00:00

AI Summary

This work introduces an attack-aware image deepfake and forensics detector designed for robustness, reliability, and transparent evidence generation under realistic counter-forensic manipulations. The method utilizes a two-stream architecture combining semantic content and forensic residuals, trained using a red-team strategy that applies the worst-of-K attacks per batch. Robustness is further enhanced by randomized test-time defense and a shallow FPN head that produces weakly supervised tamper heatmaps concentrated in face regions.

Abstract

This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic residuals, fused via a lightweight residual adapter for classification, while a shallow Feature Pyramid Network style head produces tamper heatmaps under weak supervision. Red-team training applies worst-of-K counter-forensics per batch, including JPEG realign and recompress, resampling warps, denoise-to-regrain operations, seam smoothing, small color and gamma shifts, and social-app transcodes, while test-time defense injects low-cost jitters such as resize and crop phase changes, mild gamma variation, and JPEG phase shifts with aggregated predictions. Heatmaps are guided to concentrate within face regions using face-box masks without strict pixel-level annotations. Evaluation on existing benchmarks, including standard deepfake datasets and a surveillance-style split with low light and heavy compression, reports clean and attacked performance, AUC, worst-case accuracy, reliability, abstention quality, and weak-localization scores. Results demonstrate near-perfect ranking across attacks, low calibration error, minimal abstention risk, and controlled degradation under regrain, establishing a modular, data-efficient, and practically deployable baseline for attack-aware detection with calibrated probabilities and actionable heatmaps.


Key findings
The approach achieved near-perfect threshold-free performance (AUC/AP 1.00) across clean and all six counter-forensic attacks, maintaining a high worst-case accuracy of 0.9917. Regrain manipulations emerged as the most challenging stressor, though performance remained controlled with consistently low calibration error. The weak localization strategy successfully produced evidence heatmaps concentrated within plausible face regions, suitable for audit and triage.
Approach
The detector uses a two-stream architecture: one stream encodes semantic content (via a pretrained backbone) and the other extracts forensic residuals (e.g., high-pass features), fused by a lightweight adapter. Robustness is enforced via red-team training, where the model trains on the worst-of-K counter-forensic transforms (like JPEG realign, regrain, or warp) per batch. Test-time aggregation of predictions from low-cost jitters stabilizes decisions and improves calibration, while a shallow FPN-style head generates spatial heatmaps guided by weak face-region priors.
Datasets
DeepFakeFace (DFF) from OpenRL-Lab, and CelebA (for auxiliary analysis). Evaluation includes a deployment-motivated surveillance-style split.
Model(s)
Two-stream architecture (semantic backbone + residual extractor) fused via a lightweight adapter; localization uses a shallow Feature Pyramid Network (FPN)-style head. InsightFace is used externally for generating weak face priors.
Author countries
Saudi Arabia