SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks

Authors: Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik

Published: 2025-07-17 14:33:54+00:00

AI Summary

This paper introduces SHIELD, a novel collaborative learning method designed to enhance robust deepfake audio detection against generative anti-forensic (AF) attacks. It integrates an auxiliary defense generative model to expose AF signatures and employs a triplet model to capture correlations between real and AF attacked audios, along with their generated counterparts. SHIELD demonstrates significantly improved detection accuracy against various generative AF attacks, outperforming existing methods and bolstering defense against adversarial manipulations.

Abstract

Audio plays a crucial role in applications like speaker verification, voice-enabled smart devices, and audio conferencing. However, audio manipulations, such as deepfakes, pose significant risks by enabling the spread of misinformation. Our empirical analysis reveals that existing methods for detecting deepfake audio are often vulnerable to anti-forensic (AF) attacks, particularly those attacked using generative adversarial networks. In this article, we propose a novel collaborative learning method called SHIELD to defend against generative AF attacks. To expose AF signatures, we integrate an auxiliary generative model, called the defense (DF) generative model, which facilitates collaborative learning by combining input and output. Furthermore, we design a triplet model to capture correlations for real and AF attacked audios with real-generated and attacked-generated audios using auxiliary generative models. The proposed SHIELD strengthens the defense against generative AF attacks and achieves robust performance across various generative models. The proposed AF significantly reduces the average detection accuracy from 95.49% to 59.77% for ASVspoof2019, from 99.44% to 38.45% for In-the-Wild, and from 98.41% to 51.18% for HalfTruth for three different generative models. The proposed SHIELD mechanism is robust against AF attacks and achieves an average accuracy of 98.13%, 98.58%, and 99.57% in match, and 98.78%, 98.62%, and 98.85% in mismatch settings for the ASVspoof2019, In-the-Wild, and HalfTruth datasets, respectively.


Key findings
Existing deepfake audio detection methods are highly vulnerable to generative AF attacks, with detection accuracy significantly dropping (e.g., from 95.49% to 59.77% for ASVspoof2019). The proposed SHIELD mechanism achieves robust performance against AF attacks, maintaining average accuracies of over 98% in both match and mismatch settings across all evaluated datasets. SHIELD significantly outperforms state-of-the-art defense mechanisms, with improvements ranging from 13% to 45% in accuracy.
Approach
SHIELD integrates an auxiliary generative model (GD) to reconstruct inputs, thereby exposing anti-forensic (AF) signatures. It then uses a triplet model, with an embedding network like RawNet3, to capture intricate correlations between original and GD-generated audios (both real and AF-attacked), enabling robust discrimination against generative AF attacks.
Datasets
ASVspoof2019, HalfTruth, In-the-Wild
Model(s)
Auxiliary Generative Model (GD), Triplet Model (TM), RawNet3 (as embedding network for triplet model)
Author countries
USA