Listening for Expert Identified Linguistic Features: Assessment of Audio Deepfake Discernment among Undergraduate Students

Authors: Noshaba N. Bhalli, Nehal Naqvi, Chloe Evered, Christine Mallinson, Vandana P. Janeja

Published: 2024-11-21 20:52:02+00:00

AI Summary

This paper evaluates the effectiveness of training undergraduate students to improve their ability to discern audio deepfakes by listening for expert-defined linguistic features (EDLFs). Using a pre-/post-experimental design, the study assesses whether familiarizing listeners with English language variation can enhance their perceptual awareness and discernment of fake audio. The research aims to improve human discernment as a key factor in cybersecurity solutions against audio misinformation.

Abstract

This paper evaluates the impact of training undergraduate students to improve their audio deepfake discernment ability by listening for expert-defined linguistic features. Such features have been shown to improve performance of AI algorithms; here, we ascertain whether this improvement in AI algorithms also translates to improvement of the perceptual awareness and discernment ability of listeners. With humans as the weakest link in any cybersecurity solution, we propose that listener discernment is a key factor for improving trustworthiness of audio content. In this study we determine whether training that familiarizes listeners with English language variation can improve their abilities to discern audio deepfakes. We focus on undergraduate students, as this demographic group is constantly exposed to social media and the potential for deception and misinformation online. To the best of our knowledge, our work is the first study to uniquely address English audio deepfake discernment through such techniques. Our research goes beyond informational training by introducing targeted linguistic cues to listeners as a deepfake discernment mechanism, via a training module. In a pre-/post- experimental design, we evaluated the impact of the training across 264 students as a representative cross section of all students at the University of Maryland, Baltimore County, and across experimental and control sections. Findings show that the experimental group showed a statistically significant decrease in their unsurety when evaluating audio clips and an improvement in their ability to correctly identify clips they were initially unsure about. While results are promising, future research will explore more robust and comprehensive trainings for greater impact.


Key findings
The experimental group showed a statistically significant decrease in their 'unsurety' when evaluating audio clips and an improvement in correctly identifying clips they were initially unsure about, particularly fake clips. However, this decrease in unsurety did not always directly translate to increased accuracy, and the training may have made students more skeptical of real audio. The control group, which received general deepfake information, also showed a significant improvement in accuracy when identifying real audio clips.
Approach
The researchers employed a pre-/post-experimental design with 264 undergraduate students, divided into experimental and control groups. The experimental group received a training module teaching them to identify five Expert-Defined Linguistic Features (EDLFs) in audio to discern deepfakes, while the control group read a general article about deepfakes. Students' abilities to identify real, fake, or uncertain audio clips were measured before and after the intervention.
Datasets
Audio clips for student assessment were randomly selected from several commonly used machine learning datasets, including ASVspoof 2017, FoR, LJ Speech Dataset, MelGAN, Assem-VC, and WaveNet, to create a hybrid dataset.
Model(s)
UNKNOWN
Author countries
United States