Does Cognitive Load Affect Human Accuracy in Detecting Voice-Based Deepfakes?

Authors: Marcel Gohsen, Nicola Libera, Johannes Kiesel, Jan Ehlers, Benno Stein

Published: 2026-01-15 13:37:39+00:00

Comment: Accepted as full paper to CHIIR'26

AI Summary

This paper investigates how cognitive load affects human accuracy in detecting voice-based deepfakes through an empirical study with 30 participants. The findings suggest that low cognitive load does not generally impair detection abilities. Interestingly, simultaneous exposure to a secondary stimulus can actually benefit human performance in the deepfake detection task.

Abstract

Deepfake technologies are powerful tools that can be misused for malicious purposes such as spreading disinformation on social media. The effectiveness of such malicious applications depends on the ability of deepfakes to deceive their audience. Therefore, researchers have investigated human abilities to detect deepfakes in various studies. However, most of these studies were conducted with participants who focused exclusively on the detection task; hence the studies may not provide a complete picture of human abilities to detect deepfakes under realistic conditions: Social media users are exposed to cognitive load on the platform, which can impair their detection abilities. In this paper, we investigate the influence of cognitive load on human detection abilities of voice-based deepfakes in an empirical study with 30 participants. Our results suggest that low cognitive load does not generally impair detection abilities, and that the simultaneous exposure to a secondary stimulus can actually benefit people in the detection task.


Key findings
The study found that low cognitive load (induced by a 1-back task) does not significantly impair human accuracy in detecting voice deepfakes. Conversely, exposure to a secondary visual stimulus (B-roll video) significantly improved participants' detection accuracy. Human detection abilities varied considerably among individuals, and self-reported decision confidence correlated with actual accuracy, especially under dual-task conditions.
Approach
The researchers conducted an empirical study with 30 participants across two experiments. Experiment 1 used a dual-task scenario (audio deepfake detection + 1-back task for cognitive load), while Experiment 2 involved audio deepfake detection while concurrently watching symbolic B-roll video footage.
Datasets
Custom-collected news videos from YouTube for four renowned newsreaders (Andrea Mitchell, Carl Nasman, Lester Holt, Sophie Raworth) were used to create bona fide and spoofed audio stimuli. These videos were manually filtered and edited for quality and length.
Model(s)
UNKNOWN
Author countries
Germany