Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias

Authors: Aishwarya Fursule, Shruti Kshirsagar, Anderson R. Avila

Published: 2026-05-09 17:53:04+00:00

Comment: Submitted to SMC 2026 conference

AI Summary

This paper introduces the first diagnosis-first framework to systematically identify and mitigate gender bias in audio deepfake detection systems. It evaluates two state-of-the-art models, AASIST and Wav2Vec2+ResNet18, on the ASVSpoof5 dataset, revealing that bias stems from acoustic representation differences, gender leakage in learned features, and structural evaluation asymmetry, rather than imbalanced training data. The study proposes novel mitigation methods and finds that adjusting decision thresholds per gender reduces unfairness significantly without accuracy cost, emphasizing that bias sources must be identified before applying targeted fixes.

Abstract

Audio deepfake detection systems are increasingly deployed in high-stakes security applications, yet their fairness across demographic groups remains critically underexamined. Prior work measures gender disparity but does not investigate where it comes from or how to fix it systematically. We present the first diagnosis-first framework that identifies bias source before applying targeted mitigation, evaluated on two models, AASIST and Wav2Vec2+ResNet18, on ASVSpoof5. Our diagnosis shows that bias does not stem from imbalanced training data but from acoustic representation differences, gender leakage in learned features, and structural evaluation asymmetry. We test mitigation strategies across in-processing, post-processing and combined families, including novel methods introduced in this work. Adjusting the decision threshold separately per gender reduces unfairness by 54% to 75% at no cost to detection accuracy, and our new epoch-level fairness regularisation method outperforms existing per-batch approaches. Adversarial debiasing succeeds only when gender leakage is localised, and fails when it is diffuse, an outcome correctly predicted by our diagnosis before training. No single method fully closes the fairness gap, confirming that bias sources must be identified before fixes are applied and that fairer benchmark design is equally important


Key findings
Gender bias in audio deepfake detection arises from multiple independent sources including evaluation protocol asymmetry, score distribution shift, embedding gender leakage, and single-threshold bias, but not from training data imbalance. Threshold calibration (adjusting decision thresholds separately per gender) proved to be the most reliable mitigation, reducing false positive rate differences by 54-75% with no cost to detection accuracy. Adversarial debiasing was effective only when gender leakage was localized, demonstrating the critical importance of diagnosing bias sources before applying mitigation strategies, as untargeted interventions can degrade performance.
Approach
The proposed approach is a two-stage framework: first, a systematic diagnosis stage identifies bias sources at the data, model, and decision levels using eight distinct checks (e.g., gender imbalance, score distribution differences, embedding gender leakage, SHAP analysis, threshold bias). Second, a mitigation stage applies targeted strategies from pre-processing, in-processing (including novel epoch-level fairness regularisation), and post-processing (including novel SHAP-guided feature suppression and gender-neutral embedding alignment), or combinations thereof, based on the diagnosed bias sources.
Datasets
ASVSpoof5
Model(s)
AASIST, Wav2Vec2-large+ResNet18
Author countries
USA, Canada