Graph Attention Networks for Anti-Spoofing

Authors: Hemlata Tak, Jee-weon Jung, Jose Patino, Massimiliano Todisco, Nicholas Evans

Published: 2021-04-08 10:18:17+00:00

Comment: Submitted to INTERSPEECH 2021

AI Summary

This paper introduces the use of Graph Attention Networks (GATs) to enhance anti-spoofing performance in automatic speaker verification. GATs are utilized to model the relationships between spectral sub-bands or temporal segments, addressing a limitation of previous self-attention mechanisms. The proposed GAT-based model, which processes high-level representations from a ResNet, demonstrates significant improvements in spoofing detection.

Abstract

The cues needed to detect spoofing attacks against automatic speaker verification are often located in specific spectral sub-bands or temporal segments. Previous works show the potential to learn these using either spectral or temporal self-attention mechanisms but not the relationships between neighbouring sub-bands or segments. This paper reports our use of graph attention networks (GATs) to model these relationships and to improve spoofing detection performance. GATs leverage a self-attention mechanism over graph structured data to model the data manifold and the relationships between nodes. Our graph is constructed from representations produced by a ResNet. Nodes in the graph represent information either in specific sub-bands or temporal segments. Experiments performed on the ASVspoof 2019 logical access database show that our GAT-based model with temporal attention outperforms all of our baseline single systems. Furthermore, GAT-based systems are complementary to a set of existing systems. The fusion of GAT-based models with more conventional countermeasures delivers a 47% relative improvement in performance compared to the best performing single GAT system.


Key findings
The GAT-based model with temporal attention (GAT-T) outperformed all baseline single systems on the ASVspoof 2019 LA database. GAT-based systems proved complementary to existing countermeasures; their fusion with more conventional methods delivered a 47% relative improvement in min t-DCF compared to the best performing single GAT system, achieving competitive performance with state-of-the-art systems.
Approach
The approach involves extracting high-level representations from log-linear filter bank (LFB) features using a ResNet-18 network. These representations are then used to construct a fully-connected graph where nodes represent either spectral sub-bands or temporal segments. Graph Attention Networks (GATs) are applied to this graph to model the relationships between nodes and learn attention weights for spoofing detection.
Datasets
ASVspoof 2019 Logical Access (LA) database
Model(s)
Graph Attention Networks (GATs), ResNet-18
Author countries
France, South Korea