Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing

Authors: Manasi Chhibber, Jagabandhu Mishra, Tomi H. Kinnunen

Published: 2025-09-29 12:14:58+00:00

AI Summary

This paper introduces a novel zero-shot source tracing framework for speech deepfakes, adapting the SSL-AASIST system for attack classification. It investigates both zero-shot (cosine similarity, Siamese) and few-shot (MLP, Siamese) backend scoring approaches for attack verification. Experiments show that few-shot learning offers advantages in closed-set scenarios, while zero-shot approaches are more effective for open-set source tracing.

Abstract

We propose a novel zero-shot source tracing framework inspired by advances in speaker verification. Specifically, we adapt the SSL-AASIST system for attack classification, ensuring that the attacks used for training are disjoint from those used to form fingerprint-trial pairs. For backend scoring in attack verification, we explore both zero-shot approaches (cosine similarity and Siamese) and few-shot approaches (MLP and Siamese). Experiments on our recently introduced STOPA dataset suggest that few-shot learning provides advantages in the closed-set scenario, while zero-shot approaches perform better in the open-set scenario. In closed-set trials, few-shot Siamese and MLP achieve equal error rates (EER) of 18.44% and 15.11%, compared to 27.14% for zero-shot cosine scoring. Conversely, in open-set trials, zero-shot cosine scoring reaches 21.70%, outperforming few-shot Siamese and MLP at 27.40% and 22.65%, respectively.


Key findings
Few-shot learning methods, such as Few-shot MLP (15.11% EER), showed superior performance in closed-set attack verification. Conversely, zero-shot cosine scoring achieved better results in open-set scenarios (21.70% EER), outperforming few-shot alternatives. The inclusion of SSL-AASIST embeddings with AAM loss and out-of-domain data significantly enhanced generalization, particularly for out-of-distribution attacks.
Approach
The proposed framework adapts the SSL-AASIST system as an attack embedding extractor, trained with additive angular margin (AAM) loss and additional out-of-domain data. Backend scoring compares trial embeddings against enrolled attack fingerprints, utilizing either zero-shot methods (cosine similarity, Siamese network trained on disjoint attacks) or few-shot methods (MLP, Siamese network trained on fingerprint data).
Datasets
STOPA dataset, ASVspoof 2019 LA
Model(s)
SSL-AASIST, AASIST, Siamese Network, MLP
Author countries
Finland