Audio Deepfake Verification

View on arXiv ← Back to list

Authors: Li Wang, Junyi Ao, Linyong Gan, Yuancheng Wang, Xueyao Zhang, Zhizheng Wu

Published: 2025-09-10 10:27:41+00:00

AI Summary

This paper introduces the Audio Deepfake Verification (ADV) task, which aims to determine if two audio samples originate from the same deepfake method, enabling open-set source tracing. A novel dual-branch architecture, Audity, is proposed to extract deepfake features from both audio structure and generation artifacts, outperforming single-branch approaches.

Abstract

With the rapid development of deepfake technology, simply making a binary judgment of true or false on audio is no longer sufficient to meet practical needs. Accurately determining the specific deepfake method has become crucial. This paper introduces the Audio Deepfake Verification (ADV) task, effectively addressing the limitations of existing deepfake source tracing methods in closed-set scenarios, aiming to achieve open-set deepfake source tracing. Meanwhile, the Audity dual-branch architecture is proposed, extracting deepfake features from two dimensions: audio structure and generation artifacts. Experimental results show that the dual-branch Audity architecture outperforms any single-branch configuration, and it can simultaneously achieve excellent performance in both deepfake detection and verification tasks.

Key findings

Audity outperforms single-branch architectures in audio deepfake verification. While performance varies across datasets, using multiple enrollment and verification samples significantly improves results, especially on challenging datasets. Audity also exhibits strong performance in audio deepfake detection.

Approach

The authors propose Audity, a dual-branch network. One branch extracts structural features of speech using w2v-BERT 2.0, while the other branch extracts generation artifacts using architectures like CAM++, ECAPA-TDNN, or ResNet293. These features are fused to produce discriminative deepfake embeddings.

Datasets

ASVspoof2019-LA, CodecFake(UCAS), CodecFake+, DFADD, GigaSpeech(M), LibriSeVoc, MLAAD, SpoofCeleb, Wavefake, ADD2023 Track3. Various demo pages and commercial TTS systems were also used for testing.

Model(s)

Audity (dual-branch architecture using w2v-BERT 2.0, CAM++, ECAPA-TDNN, or ResNet293)

Author countries

China