Audio Deepfake Attribution: An Initial Dataset and Investigation

View on arXiv ← Back to list

Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Jie Chen

Published: 2022-08-21 05:15:40+00:00

AI Summary

This paper introduces the first audio deepfake attribution dataset (ADA) for identifying the source of deepfake audio. To address the challenge of attributing audio from unknown sources, a novel open-set audio deepfake attribution (OSADA) method called Class-Representation Multi-Center Learning (CRML) is proposed.

Abstract

The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious manipulation of content. This has led to an increase in studies aimed at detecting so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, binary classification alone is insufficient. It is essential to identify the source of deepfake audio. Therefore, audio deepfake attribution has emerged as a new challenge. To this end, we designed the first deepfake audio dataset for the attribution of audio generation tools, called Audio Deepfake Attribution (ADA), and conducted a comprehensive investigation on system fingerprints. To address the challenges of attribution of continuously emerging unknown audio generation tools in the real world, we propose the Class-Representation Multi-Center Learning (CRML) method for open-set audio deepfake attribution (OSADA). CRML enhances the global directional variation of representations, ensuring the learning of discriminative representations with strong intra-class similarity and inter-class discrepancy among known classes. Finally, the strong class discrimination capability learned from known classes is extended to both known and unknown classes. Experimental results demonstrate that the CRML method effectively addresses open-set risks in real-world scenarios. The dataset is publicly available at: https://zenodo.org/records/13318702, and https://zenodo.org/records/13340666.

Key findings

The CRML method effectively addresses open-set risks in real-world scenarios, showing significant improvements over benchmark performance on both clean and compressed subsets of the ADA dataset. The ADA dataset, the first of its kind for audio deepfake attribution, is publicly available.

Approach

The authors create the ADA dataset, containing audio from known and unknown Chinese TTS vendors. They propose the CRML method for OSADA, which enhances the global directional variation of representations to improve discrimination between known and unknown classes, addressing open-set risks.

Datasets

Audio Deepfake Attribution (ADA) dataset, including clean and compressed subsets. Real audio data from AISHELL-1, AISHELL-3, THCHS-30, and Aidatatang 200zh.

Model(s)

Class-Representation Multi-Center Learning (CRML) model.

Author countries

China

← Previous