Audios Don't Lie: Multi-Frequency Channel Attention Mechanism for Audio Deepfake Detection

Authors: Yangguang Feng

Published: 2024-12-12 17:15:49+00:00

AI Summary

This study introduces an audio deepfake detection method leveraging a multi-frequency channel attention mechanism (MFCA) and 2D discrete cosine transform (DCT). The approach processes audio into melspectrograms, extracts deep features using MobileNet V2, and employs MFCA to weight frequency channels, enhancing the capture of fine-grained frequency domain features. Experimental results demonstrate significant improvements in detection accuracy, robustness, and generalization compared to traditional methods.

Abstract

With the rapid development of artificial intelligence technology, the application of deepfake technology in the audio field has gradually increased, resulting in a wide range of security risks. Especially in the financial and social security fields, the misuse of deepfake audios has raised serious concerns. To address this challenge, this study proposes an audio deepfake detection method based on multi-frequency channel attention mechanism (MFCA) and 2D discrete cosine transform (DCT). By processing the audio signal into a melspectrogram, using MobileNet V2 to extract deep features, and combining it with the MFCA module to weight different frequency channels in the audio signal, this method can effectively capture the fine-grained frequency domain features in the audio signal and enhance the Classification capability of fake audios. Experimental results show that compared with traditional methods, the model proposed in this study shows significant advantages in accuracy, precision,recall, F1 score and other indicators. Especially in complex audio scenarios, this method shows stronger robustness and generalization capabilities and provides a new idea for audio deepfake detection and has important practical application value. In the future, more advanced audio detection technologies and optimization strategies will be explored to further improve the accuracy and generalization capabilities of audio deepfake detection.


Key findings
The proposed MFCMNet model achieved a significant improvement in accuracy by 4.5% over the baseline model, demonstrating higher recognition capabilities and robustness in complex deepfake audio scenarios. It also showed strong performance in recall and F1 score, indicating a good balance between precision and recall and superior overall effectiveness for audio deepfake detection.
Approach
The proposed method preprocesses audio signals into mel-spectrograms. MobileNet V2 is then used to extract deep features, which are further enhanced by a Multi-Frequency Channel Attention Mechanism (MFCA) module. MFCA separates features into distinct frequency bands (low, medium, high) and applies 2D Discrete Cosine Transform (DCT) to assign dynamic weights, thereby emphasizing critical frequency domain features for improved deepfake detection.
Datasets
Fake or Real dataset (for-norm version)
Model(s)
MobileNet V2, Multi-Frequency Channel Attention Mechanism (MFCA), 2D Discrete Cosine Transform (DCT)
Author countries
China