Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

View on arXiv ← Back to list

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Published: 2024-03-18 13:35:10+00:00

AI Summary

This research explores the feasibility of using static deepfake audio detection models in real-time communication platforms. Two models (ResNet and LCNN) were implemented and tested on the ASVspoof 2019 dataset and a new dataset from Microsoft Teams meetings, demonstrating the challenges of adapting static models to dynamic real-time scenarios.

Abstract

Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.

Key findings

The models achieved benchmark performance on the ASVspoof 2019 dataset. However, performance significantly degraded in the real-time Teams meeting tests, highlighting the limitations of applying static models to dynamic, real-world scenarios. Future work focuses on addressing this limitation through data augmentation and generative model approaches.

Approach

The study implemented two deepfake audio detection models based on ResNet and LCNN architectures, trained on the ASVspoof 2019 dataset. A software application was developed for real-time testing in Microsoft Teams meetings, evaluating performance against a new dataset created from those meetings.

Datasets

ASVspoof 2019 (LA and PA datasets), a new dataset created from Microsoft Teams meeting sessions.

Model(s)

ResNet, LCNN

Author countries

USA

← Previous