Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
Authors: Manasi Chhibber, Jagabandhu Mishra, Tomi H. Kinnunen
Published: 2025-09-29 12:14:58+00:00
AI Summary
This work proposes a novel zero-shot framework for open-set speech deepfake source tracing, adapting the SSL-AASIST system with AAM loss for improved attack embedding extraction. It systematically compares zero-shot (cosine, Siamese) and few-shot (MLP, Siamese) backend scoring methods to attribute synthesized speech to its generative source. Experiments confirm that zero-shot cosine scoring generalizes best in the difficult open-set scenario.
Abstract
We propose a novel zero-shot source tracing framework inspired by advances in speaker verification. Specifically, we adapt the SSL-AASIST system for attack classification, ensuring that the attacks used for training are disjoint from those used to form fingerprint-trial pairs. For backend scoring in attack verification, we explore both zero-shot approaches (cosine similarity and Siamese) and few-shot approaches (MLP and Siamese). Experiments on our recently introduced STOPA dataset suggest that few-shot learning provides advantages in the closed-set scenario, while zero-shot approaches perform better in the open-set scenario. In closed-set trials, few-shot Siamese and MLP achieve equal error rates (EER) of 18.44% and 15.11%, compared to 27.14% for zero-shot cosine scoring. Conversely, in open-set trials, zero-shot cosine scoring reaches 21.70%, outperforming few-shot Siamese and MLP at 27.40% and 22.65%, respectively.