On-Device Voice Authentication with Paralinguistic Privacy

View on arXiv ← Back to list

Authors: Ranya Aloufi, Hamed Haddadi, David Boyle

Published: 2022-05-27 14:56:01+00:00

AI Summary

This research paper presents a novel on-device voice authentication system that prioritizes user privacy while maintaining high accuracy. The system locally derives token-based credentials from voice data, allowing selective filtering of sensitive information before transmission to service providers, thereby mitigating privacy risks associated with cloud-based voice authentication.

Abstract

Using our voices to access, and interact with, online services raises concerns about the trade-offs between convenience, privacy, and security. The conflict between maintaining privacy and ensuring input authenticity has often been hindered by the need to share raw data, which contains all the paralinguistic information required to infer a variety of sensitive characteristics. Users of voice assistants put their trust in service providers; however, this trust is potentially misplaced considering the emergence of first-party 'honest-but-curious' or 'semi-honest' threats. A further security risk is presented by imposters gaining access to systems by pretending to be the user leveraging replay or 'deepfake' attacks. Our objective is to design and develop a new voice input-based system that offers the following specifications: local authentication to reduce the need for sharing raw voice data, local privacy preservation based on user preferences, allowing more flexibility in integrating such a system given target applications privacy constraints, and achieving good performance in these targeted applications. The key idea is to locally derive token-based credentials based on unique-identifying attributes obtained from the user's voice and offer selective sensitive information filtering before transmitting raw data. Our system consists of (i) 'VoiceID', boosted with a liveness detection technology to thwart replay attacks; (ii) a flexible privacy filter that allows users to select the level of privacy protection they prefer for their data. The system yields 98.68% accuracy in verifying legitimate users with cross-validation and runs in tens of milliseconds on a CPU and single-core ARM processor without specialized hardware. Our system demonstrates the feasibility of filtering raw voice input closer to users, in accordance with their privacy preferences, while maintaining their authenticity.

Key findings

The system achieves 98.68% accuracy in verifying legitimate users with cross-validation and runs in tens of milliseconds on a CPU and single-core ARM processor. The privacy-preserving filter maintains utility with minimal performance penalties (approximately 6% WER compared to cloud-based systems). The system demonstrates the feasibility of secure and private on-device voice authentication.

Approach

The proposed system uses a 'VoiceID' module combining identity, spoofing, and liveness classifiers to authenticate users locally. A flexible privacy filter allows users to control the level of data shared, generating token-based credentials for authentication without revealing raw voice data.

Datasets

A new dataset created by recruiting 20 participants who repeated 12 commands three times each. A subset of the Librispeech test dataset was also used for ASR evaluation.

Model(s)

Deep residual CNN (ResCNN) for speaker embedding extraction, AASIST-L for spoofing detection, VOID for liveness detection, CPC-kmean clustering for phonetic content extraction, and various machine learning models (SVM, MLP, kNN, SGD, logistic regression) for classification.

Author countries

United Kingdom

← Previous