Reliable Audio Deepfake Detection in Variable Conditions via Quantum-Kernel SVMs

Authors: Lisan Al Amin, Vandana P. Janeja

Published: 2025-12-21 16:31:05+00:00

AI Summary

Quantum-Kernel SVMs (QSVMs) are proposed for reliable audio deepfake detection, particularly in conditions where labeled data is scarce and variability is high. By leveraging quantum feature maps simulated classically as a drop-in kernel replacement for standard SVMs, the method enhances feature separability. This approach achieves consistently lower Equal Error Rates (EER) and False Positive Rates (FPR) compared to classical kernels without introducing additional trainable parameters.

Abstract

Detecting synthetic speech is challenging when labeled data are scarce and recording conditions vary. Existing end-to-end deep models often overfit or fail to generalize, and while kernel methods can remain competitive, their performance heavily depends on the chosen kernel. Here, we show that using a quantum kernel in audio deepfake detection reduces falsepositive rates without increasing model size. Quantum feature maps embed data into high-dimensional Hilbert spaces, enabling the use of expressive similarity measures and compact classifiers. Building on this motivation, we compare quantum-kernel SVMs (QSVMs) with classical SVMs using identical mel-spectrogram preprocessing and stratified 5-fold cross-validation across four corpora (ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, and an In-the-Wild set). QSVMs achieve consistently lower equalerror rates (EER): 0.183 vs. 0.299 on ASVspoof 5 (2024), 0.081 vs. 0.188 on ADD23, 0.346 vs. 0.399 on ASVspoof 2019, and 0.355 vs. 0.413 In-the-Wild. At the EER operating point (where FPR equals FNR), these correspond to absolute false-positiverate reductions of 0.116 (38.8%), 0.107 (56.9%), 0.053 (13.3%), and 0.058 (14.0%), respectively. We also report how consistent the results are across cross-validation folds and margin-based measures of class separation, using identical settings for both models. The only modification is the kernel; the features and SVM remain unchanged, no additional trainable parameters are introduced, and the quantum kernel is computed on a conventional computer.


Key findings
QSVMs achieved significantly lower Equal Error Rates (EER) across all datasets compared to classical SVMs, showing large relative improvements (e.g., EER reduction from 0.299 to 0.183 on ASVspoof 5). These gains corresponded to absolute false-positive-rate reductions of up to 56.9% (on ADD23). QSVMs also demonstrated superior security scores and stability across cross-validation folds, suggesting higher resilience to variable conditions.
Approach
The authors perform a controlled kernel-swap experiment using identical mel-spectrogram features (standardized and PCA-reduced). Classification is performed by a standard SVM solver, utilizing either classical kernels or quantum kernels computed from parameterized quantum feature maps (e.g., ZZFeatureMap) in classical simulation. This isolates the performance difference to the kernel's capacity to induce better linear separation in the high-dimensional feature space.
Datasets
ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, In-the-Wild set.
Model(s)
Support Vector Machines (SVMs), Quantum Support Vector Machines (QSVMs), ZZFeatureMap, PauliFeatureMap, ZFeatureMap.
Author countries
USA