Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

Authors: Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Published: 2021-06-11 13:03:33+00:00

Comment: Accepted to Interspeech 2021. Example code available at https://github.com/asvspoof-challenge/classifier-adjacency

AI Summary

This paper introduces a method to visualize classifier adjacency relations in a 2D space by computing distances between binary classifiers based on their detection scores on a common dataset. The approach utilizes Kendall's τ rank correlation to define these distances, which are then mapped using classical multidimensional scaling (MDS) to facilitate visual comparison and complement traditional ROC/DET analyses. The method is demonstrated through case studies in automatic speaker verification (ASV) and voice anti-spoofing.

Abstract

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity. We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers in response to a common dataset. Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores and with close relation to receiver operating characteristic (ROC) and detection error trade-off (DET) analyses. While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems. The former are produced by a Gaussian mixture model system trained with VoxCeleb data whereas the latter stem from submissions to the ASVspoof 2019 challenge.


Key findings
The visualization method successfully exposed observable trends related to classifier design parameters (e.g., training data, number of filters/components) in a controlled ASV setting, offering insights beyond traditional performance metrics. For uncontrolled anti-spoofing challenge submissions, it revealed significant diversity among systems, including top performers, suggesting the potential for complementary fusions. The choice between using raw or grouped (averaged) scores for analysis was found to yield different insights, without a universally preferred variant.
Approach
The proposed method computes pairwise distances between binary classifiers using Kendall's τ rank correlation, based on the detection scores produced for a common evaluation dataset. These correlation-based distances are then used as input for classical (non-metric) multidimensional scaling (MDS) to generate a 2D visualization of classifier adjacencies.
Datasets
VoxCeleb, VoxCeleb1, VoxCeleb2, LibriSpeech, ASVspoof 2019 (Logical Access (LA) and Physical Access (PA) scenarios)
Model(s)
GMM-UBM (for ASV); various ASVspoof 2019 challenge submissions characterized by frontends (Raw DFT, CQT, Mixed, Other) and backends (DNN, CNN, GMM, LSTM, TDNN)
Author countries
Finland, France, Japan, Spain, Singapore