Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere

View on arXiv ← Back to list

Authors: Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang

Published: 2025-08-03 19:16:29+00:00

AI Summary

This paper introduces Poin-HierNet, a novel framework for generalizable audio deepfake detection. Poin-HierNet leverages the Poincaré sphere to construct domain-invariant hierarchical representations, outperforming state-of-the-art methods in Equal Error Rate.

Abstract

Audio deepfake detection (ADD) faces critical generalization challenges due to diverse real-world spoofing attacks and domain variations. However, existing methods primarily rely on Euclidean distances, failing to adequately capture the intrinsic hierarchical structures associated with attack categories and domain factors. To address these issues, we design a novel framework Poin-HierNet to construct domain-invariant hierarchical representations in the Poincar'e sphere. Poin-HierNet includes three key components: 1) Poincar'e Prototype Learning (PPL) with several data prototypes aligning sample features and capturing multilevel hierarchies beyond human labels; 2) Hierarchical Structure Learning (HSL) leverages top prototypes to establish a tree-like hierarchical structure from data prototypes; and 3) Poincar'e Feature Whitening (PFW) enhances domain invariance by applying feature whitening to suppress domain-sensitive features. We evaluate our approach on four datasets: ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF, and In-The-Wild. Experimental results demonstrate that Poin-HierNet exceeds state-of-the-art methods in Equal Error Rate.

Key findings

Poin-HierNet achieves state-of-the-art results on four datasets, demonstrating superior generalization capabilities compared to existing methods. Ablation studies confirm the effectiveness of each component of the proposed framework. The choice of prototype numbers significantly impacts performance.

Approach

Poin-HierNet uses three components: Poincaré Prototype Learning (PPL) to align sample features with prototypes, Hierarchical Structure Learning (HSL) to establish a tree-like structure from prototypes, and Poincaré Feature Whitening (PFW) to enhance domain invariance. These are combined with a wav2vec 2.0 XLS-R frontend and AASIST backend.

Datasets

ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF, In-The-Wild

Model(s)

wav2vec 2.0 XLS-R (0.3B), AASIST, Poin-HierNet (custom framework)

Author countries

China

← Previous