Data Augmentation with Signal Companding for Detection of Logical Access Attacks

Authors: Rohan Kumar Das, Jichen Yang, Haizhou Li

Published: 2021-02-12 02:51:06+00:00

Comment: 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021

AI Summary

This paper proposes a novel data augmentation technique utilizing a-law and mu-law based signal companding to improve the detection of logical access attacks against automatic speaker verification (ASV) systems. The method aims to enhance the robustness of spoofing countermeasures, particularly against unknown attack types derived from advanced voice conversion and text-to-speech technologies. Experiments show that this companding-based augmentation outperforms traditional data augmentation and state-of-the-art countermeasures in handling unseen logical access attacks.

Abstract

The recent advances in voice conversion (VC) and text-to-speech (TTS) make it possible to produce natural sounding speech that poses threat to automatic speaker verification (ASV) systems. To this end, research on spoofing countermeasures has gained attention to protect ASV systems from such attacks. While the advanced spoofing countermeasures are able to detect known nature of spoofing attacks, they are not that effective under unknown attacks. In this work, we propose a novel data augmentation technique using a-law and mu-law based signal companding. We believe that the proposed method has an edge over traditional data augmentation by adding small perturbation or quantization noise. The studies are conducted on ASVspoof 2019 logical access corpus using light convolutional neural network based system. We find that the proposed data augmentation technique based on signal companding outperforms the state-of-the-art spoofing countermeasures showing ability to handle unknown nature of attacks.


Key findings
The proposed signal companding based data augmentation (DASC) significantly improves the detection of unknown logical access attacks, achieving a 1.16% absolute improvement in Equal Error Rate (EER) on the ASVspoof 2019 evaluation set compared to the CQT-LCNN baseline without augmentation. DASC also outperforms traditional noise-based data augmentation methods and state-of-the-art single spoofing countermeasure systems on the ASVspoof 2019 logical access corpus.
Approach
The authors propose a novel data augmentation technique using a-law and mu-law based signal companding, which compresses and then expands audio signals. This method is applied to the training data to increase its diversity and build more robust spoofing countermeasure models, specifically for detecting unknown logical access attacks. The augmented data is then used to train a Light Convolutional Neural Network (LCNN) system.
Datasets
ASVspoof 2019 logical access corpus, VCTK corpus (for bona fide examples), NoiseX-92 database (for comparative noise data augmentation studies).
Model(s)
Light Convolutional Neural Network (LCNN) system, with long-term Constant-Q Transform (CQT) based log power spectrum (LPS) as input features.
Author countries
Singapore