Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere

Authors: Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang

Published: 2025-08-03 19:16:29+00:00

Comment: Accepted for publication on Interspeech 2025

AI Summary

This paper introduces Poin-HierNet, a novel framework for generalizable audio deepfake detection (ADD) that addresses critical generalization challenges due to diverse spoofing attacks and domain variations. Poin-HierNet constructs domain-invariant hierarchical representations in the Poincaré sphere, moving beyond traditional Euclidean distance-based methods. It achieves this through three key components: Poincaré Prototype Learning (PPL), Hierarchical Structure Learning (HSL), and Poincaré Feature Whitening (PFW).

Abstract

Audio deepfake detection (ADD) faces critical generalization challenges due to diverse real-world spoofing attacks and domain variations. However, existing methods primarily rely on Euclidean distances, failing to adequately capture the intrinsic hierarchical structures associated with attack categories and domain factors. To address these issues, we design a novel framework Poin-HierNet to construct domain-invariant hierarchical representations in the Poincaré sphere. Poin-HierNet includes three key components: 1) Poincaré Prototype Learning (PPL) with several data prototypes aligning sample features and capturing multilevel hierarchies beyond human labels; 2) Hierarchical Structure Learning (HSL) leverages top prototypes to establish a tree-like hierarchical structure from data prototypes; and 3) Poincaré Feature Whitening (PFW) enhances domain invariance by applying feature whitening to suppress domain-sensitive features. We evaluate our approach on four datasets: ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF, and In-The-Wild. Experimental results demonstrate that Poin-HierNet exceeds state-of-the-art methods in Equal Error Rate.


Key findings
Poin-HierNet significantly outperforms state-of-the-art methods, achieving superior Equal Error Rate (EER) on challenging generalization datasets. It obtained EERs of 0.11% on ASVspoof 2019 LA, 1.40% on ASVspoof 2021 DF, and 4.91% on In-The-Wild, demonstrating robust generalization capabilities. Ablation studies confirm the effectiveness of each proposed component (PPL, HSL, PFW) in enhancing performance.
Approach
Poin-HierNet constructs domain-invariant hierarchical representations in the Poincaré sphere using three components. Poincaré Prototype Learning (PPL) aligns sample features with data prototypes to capture multilevel hierarchies. Hierarchical Structure Learning (HSL) leverages top prototypes to establish a tree-like hierarchical structure from these data prototypes. Poincaré Feature Whitening (PFW) enhances domain invariance by applying feature whitening to suppress domain-sensitive features.
Datasets
ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF, In-The-Wild
Model(s)
wav2vec 2.0 XLS-R (0.3B) (frontend feature extractor), AASIST (backend model)
Author countries
China