Probabilistic Verification of Voice Anti-Spoofing Models

Authors: Evgeny Kushnir, Alexandr Kozodaev, Dmitrii Korzh, Mikhail Pautov, Oleg Kiriukhin, Oleg Y. Rogov

Published: 2026-03-11 12:40:24+00:00

Comment: The paper was submitted for review to Interspeech 2026

AI Summary

The paper introduces PV-VASM, a probabilistic framework designed to verify the robustness of voice anti-spoofing models (VASMs) against malicious speech synthesis and input perturbations. It estimates the probability of misclassification under text-to-speech (TTS), voice cloning (VC), and parametric signal transformations, offering model-agnostic robustness guarantees. PV-VASM derives a theoretical upper bound on the error probability and is validated as a practical verification tool across diverse experimental settings.

Abstract

Recent advances in generative models have amplified the risk of malicious misuse of speech synthesis technologies, enabling adversaries to impersonate target speakers and access sensitive resources. Although speech deepfake detection has progressed rapidly, most existing countermeasures lack formal robustness guarantees or fail to generalize to unseen generation techniques. We propose PV-VASM, a probabilistic framework for verifying the robustness of voice anti-spoofing models (VASMs). PV-VASM estimates the probability of misclassification under text-to-speech (TTS), voice cloning (VC), and parametric signal transformations. The approach is model-agnostic and enables robustness verification against unseen speech synthesis techniques and input perturbations. We derive a theoretical upper bound on the error probability and validate the method across diverse experimental settings, demonstrating its effectiveness as a practical robustness verification tool.


Key findings
PV-VASM effectively provides robustness certificates for VASMs against various attacks, demonstrating that model robustness significantly varies with perturbation type and parameter space. While models show strong robustness to simple parametric transformations, their performance degrades considerably against more challenging transformations and speech generation models. Fine-tuning the VASM on data from specific TTS/VC models substantially improves its certified robustness against those particular spoofing methods.
Approach
PV-VASM is a model-agnostic probabilistic framework that estimates the misclassification probability of VASMs using Chernoff inequality. It provides an upper bound on this probability under various input perturbations, including parametric transformations and data generated by TTS and VC models. The framework also estimates its own error probability to ensure the reliability of the robustness certificate.
Datasets
ASVspoof 19, ASVspoof 21 (LA and DF), ASVspoof 5, ADD 22-23, DFADD, SONAR, CFAD, MLAAD, Speech-to-Latex, Mozilla Common Voice, OpenSLR (for RIR), Musan (for background noise). Generative models evaluated include Vosk, Silero, Coqui XTTS-v2, f5-TTS, CosyVoice, ElevenLabs, and Finevoice.
Model(s)
Wav2Vec2-AASIST
Author countries
Russia, Hong Kong