Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Published: 2024-03-18 13:35:10+00:00

AI Summary

This study investigates the feasibility of deploying static deepfake audio detection models in real-time communication platforms. It implements ResNet and LCNN models, training them on the ASVspoof 2019 dataset, and develops cross-platform software to assess their real-time performance in actual communication scenarios. The work highlights challenges for static models in dynamic real-time environments and proposes future strategies for enhancement.

Abstract

Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.


Key findings
The implemented ResNet and LCNN models achieved benchmark performances on the ASVspoof 2019 evaluation data, outperforming baselines. However, when deployed in real-time communication scenarios (Teams meetings), these static models exhibited poor performance (e.g., F-scores of 0.40 and 0.45), indicating a significant drop in effectiveness due to the dynamic nature of real-time audio streams compared to static training data.
Approach
The authors implement two static deepfake audio detection models based on ResNet and LCNN architectures. These models are trained using the ASVspoof 2019 dataset, and an executable software is developed to deploy and test their performance in real-time communication platforms, specifically Microsoft Teams meetings.
Datasets
ASVspoof 2019 (LA and PA challenges), Teams Meeting dataset (curated by authors for real-time testing)
Model(s)
ResNet, LCNN
Author countries
USA