Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Published: 2024-04-26 09:36:49+00:00

AI Summary

This paper introduces novel graph domain features, GFDCC, GFLC, and GFLDC, for detecting replay speech attacks by incorporating logarithmic processing and device-related linear transformations derived from the graph frequency domain. These features are evaluated with GMM and LCNN classifiers, demonstrating superior performance against existing front-ends on ASVspoof 2017 V2, ASVspoof 2019 PA, and ASVspoof 2021 PA datasets. The approach effectively captures device and environmental noise effects, which are crucial for robust replay speech detection.

Abstract

The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection.


Key findings
The proposed features, GFLC, GFDCC, and GFLDC, consistently outperform baseline and existing front-ends on the ASVspoof datasets, with GFLDC+GMM achieving the best performance on ASVspoof 2017 V2. GFDCC and GFLDC systems also showed significant improvements on ASVspoof 2021 PA, particularly when training and evaluation data did not have a severe domain mismatch. The introduction of logarithmic processing and device information proved robust against replay attack detection.
Approach
The approach proposes three novel features: Graph Frequency Device Cepstral Coefficient (GFDCC), Graph Frequency Logarithmic Coefficient (GFLC), and Graph Frequency Logarithmic Device Coefficient (GFLDC). These features are derived from the graph frequency domain, incorporating logarithmic processing and a device-related linear transformation, which is trained using parallel genuine and replay speech data aligned with Dynamic Time Warping.
Datasets
ASVspoof 2017 V2, ASVspoof 2019 physical access, ASVspoof 2021 physical access
Model(s)
Gaussian Mixture Model (GMM), Light Convolutional Neural Network (LCNN)
Author countries
China, Singapore