On-Device Voice Authentication with Paralinguistic Privacy

Authors: Ranya Aloufi, Hamed Haddadi, David Boyle

Published: 2022-05-27 14:56:01+00:00

Comment: 15 pages

AI Summary

This paper presents 'VoiceID', an on-device voice authentication system designed to offer local privacy preservation and robust security against replay and deepfake attacks. It locally derives token-based credentials from unique voice attributes, incorporating liveness detection and a flexible privacy filter to selectively remove paralinguistic information before data transmission. The system achieves high authentication accuracy while ensuring user sovereignty over their voice data on edge devices.

Abstract

Using our voices to access, and interact with, online services raises concerns about the trade-offs between convenience, privacy, and security. The conflict between maintaining privacy and ensuring input authenticity has often been hindered by the need to share raw data, which contains all the paralinguistic information required to infer a variety of sensitive characteristics. Users of voice assistants put their trust in service providers; however, this trust is potentially misplaced considering the emergence of first-party 'honest-but-curious' or 'semi-honest' threats. A further security risk is presented by imposters gaining access to systems by pretending to be the user leveraging replay or 'deepfake' attacks. Our objective is to design and develop a new voice input-based system that offers the following specifications: local authentication to reduce the need for sharing raw voice data, local privacy preservation based on user preferences, allowing more flexibility in integrating such a system given target applications privacy constraints, and achieving good performance in these targeted applications. The key idea is to locally derive token-based credentials based on unique-identifying attributes obtained from the user's voice and offer selective sensitive information filtering before transmitting raw data. Our system consists of (i) 'VoiceID', boosted with a liveness detection technology to thwart replay attacks; (ii) a flexible privacy filter that allows users to select the level of privacy protection they prefer for their data. The system yields 98.68% accuracy in verifying legitimate users with cross-validation and runs in tens of milliseconds on a CPU and single-core ARM processor without specialized hardware. Our system demonstrates the feasibility of filtering raw voice input closer to users, in accordance with their privacy preferences, while maintaining their authenticity.


Key findings
The system achieved 98.68% accuracy in verifying legitimate users, demonstrating its robustness against impersonation and replay attacks. It operates efficiently on edge devices, running in tens of milliseconds on CPUs and single-core ARM processors without specialized hardware. The flexible privacy filter effectively anonymizes paralinguistic attributes to levels comparable to random guessing, while maintaining utility for transcription tasks with minimal performance penalties (~6% WER).
Approach
The system tackles the problem by locally deriving token-based credentials from unique-identifying voice attributes and applying selective sensitive information filtering before transmitting raw data. It comprises 'VoiceID', which includes liveness detection to thwart replay attacks, and a flexible privacy filter allowing users to select their preferred level of privacy protection. Authentication involves fusing decisions from identity, spoofing, and liveness classifiers, followed by generating public-private key credentials.
Datasets
Custom dataset (20 participants), Librispeech test dataset (subset)
Model(s)
AASIST-L (for spoofing detection), SVM trained on VOID features (for liveness detection). The 'VoiceID' system fuses these with a 'deep speaker' (ResCNN) model for identity verification, with an SVM classifier performing the final fusion.
Author countries
United Kingdom