Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Authors: Amir Mohammad Rostami, Mohammad Mehdi Homayounpour, Ahmad Nickabadi

Published: 2021-09-05 12:10:16+00:00

AI Summary

This paper introduces the Efficient Attention Branch Network (EABN) architecture with a combined loss function to improve generalization in Automatic Speaker Verification (ASV) spoof detection. The EABN utilizes attention and perception branches, employing EfficientNet-A0 or SE-Res2Net50, and a novel combined loss that includes Triplet Center Loss. This approach achieves state-of-the-art results on the ASVspoof 2019 dataset for both logical and physical access scenarios, particularly with EfficientNet-A0 requiring fewer parameters.

Abstract

Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. Upon further investigation of the proposed methods, it appears that a broader three-tiered view of the proposed systems. comprised of the classifier, feature extraction phase, and model loss function, may to some extent lessen the problem. Accordingly, the present study proposes the Efficient Attention Branch Network (EABN) modular architecture with a combined loss function to address the generalization problem...


Key findings
The EABN achieved state-of-the-art performance on ASVspoof 2019, with an EER of 0.86% and t-DCF of 0.0239 for the Physical Access scenario using EfficientNet-A0. For the Logical Access scenario, it obtained an EER of 1.89% and t-DCF of 0.0507 using SE-Res2Net50, outperforming other state-of-the-art single systems. Notably, the EfficientNet-A0 variant achieved strong results with significantly fewer parameters (95,000) compared to other models.
Approach
The authors propose an Efficient Attention Branch Network (EABN) comprising an attention branch to generate interpretable masks and a perception branch for spoof detection. The perception branch leverages an EfficientNet-A0 architecture, optimized with a novel combined loss function that incorporates weighted Cross-Entropy and Triplet Center Loss to enhance generalization.
Datasets
ASVspoof 2019
Model(s)
Efficient Attention Branch Network (EABN), EfficientNet-A0, SE-Res2Net50
Author countries
Iran