MS-GAGA: Metric-Selective Guided Adversarial Generation Attack

View on arXiv ← Back to list

Authors: Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava, Harshavardhan Abichandani, Pai Chet Ng, Xiaoxiao Miao

Published: 2025-10-14 13:01:40+00:00

AI Summary

MS-GAGA is a two-stage framework for crafting highly transferable and visually imperceptible adversarial examples against black-box deepfake detectors. Stage 1 employs a dual-stream attack (MNTD-PGD and SG-PGD) to expand the adversarial search space for improved transferability. Stage 2 utilizes a metric-aware selection module that jointly optimizes for attack success against black-box models and structural similarity (SSIM) to the original image.

Abstract

We present MS-GAGA (Metric-Selective Guided Adversarial Generation Attack), a two-stage framework for crafting transferable and visually imperceptible adversarial examples against deepfake detectors in black-box settings. In Stage 1, a dual-stream attack module generates adversarial candidates: MNTD-PGD applies enhanced gradient calculations optimized for small perturbation budgets, while SG-PGD focuses perturbations on visually salient regions. This complementary design expands the adversarial search space and improves transferability across unseen models. In Stage 2, a metric-aware selection module evaluates candidates based on both their success against black-box models and their structural similarity (SSIM) to the original image. By jointly optimizing transferability and imperceptibility, MS-GAGA achieves up to 27% higher misclassification rates on unseen detectors compared to state-of-the-art attacks.

Key findings

MS-GAGA achieved significantly higher misclassification rates on unseen black-box detectors (up to 99.6%) compared to state-of-the-art baselines, resulting in up to 27% higher success rates. The selection module successfully balances transferability and imperceptibility, yielding a superior overall score compared to Carlini-Wagner and Square attacks. The results highlight significant vulnerabilities in current deepfake detection pipelines, even those based on CNN architectures.

Approach

The framework uses a dual-stream attack strategy: MNTD-PGD incorporates robust optimization (Momentum, Nesterov, Translation, Diversity) and SSIM regularization, while SG-PGD focuses perturbations only on visually salient regions. The final adversarial example is selected based on a joint score that maximizes black-box misclassification success while ensuring high perceptual similarity (SSIM).

Datasets

AADD Challenge (ACM Multimedia 2025)

Model(s)

MNTD-PGD, SG-PGD (Attack architectures); ResNet-50, DenseNet-121, EfficientNet-B0, Inception-v3 (Target/Surrogate Deepfake Detectors)

Author countries

USA, Singapore, China

← Previous