Selfie-Capture Dynamics as an Auxiliary Signal Against Deepfakes and Injection Attacks for Mobile Identity Verification

Authors: Erkka Rantahalvari, Olli Silvén, Zinelabidine Boulkenafet, Constantino Álvarez Casado

Published: 2026-04-30 20:46:18+00:00

Comment: 12 pages, 5 figures, 8 tables, 51 references, conference

AI Summary

This paper explores the use of passive motion traces captured during mobile selfie verification as an auxiliary signal to combat deepfakes and injection attacks in remote identity verification (RIdV). The authors introduce CanSelfie, a multi-sensor dataset, and benchmark various time-series classifiers and anomaly detectors for spoof screening and user verification. Their findings indicate that short selfie-capture motion dynamics contain measurable spoof-related and identity-related information, providing a low-friction complementary evidence channel.

Abstract

Mobile remote identity verification (RIdV) systems are exposed to attacks that manipulate or replace the facial video stream, including presentation attacks, real-time deepfakes, and video injection. Recent European requirements, including ETSI TS 119 461 and CEN/TS 18099, motivate complementary evidence channels beyond camera-based presentation-attack detection. This paper investigates whether passive motion traces recorded during selfie capture provide auxiliary evidence for spoof screening and user verification. We introduce CanSelfie, a dataset of 375 bona fide multi-sensor sequences collected at 50\\,Hz from 30 participants using a commercial mobile RIdV application, together with stationary, handheld, and temporally shifted attack-proxy scenarios. We benchmark 7 multivariate time-series classifiers and 8 whole-series anomaly detectors across sensor configurations and temporal windows. For spoof screening, accelerometer-only ROCKAD obtains 0.00\\% false rejection rate (FRR) and 43.8\\% false acceptance rate (FAR), while QUANT+3-NN obtains the lowest overall FAR of 32.0\\% at 2.37\\% FRR; both reject all stationary attack proxies. For same-device and same-session user verification, WEASEL+MUSE reaches 1.07\\% equal error rate (EER) using 9 sensor channels. The analysis shows that raw accelerometer data, preserving gravity and orientation cues, is the most informative modality, and that closed-set classification accuracy alone does not imply good verification performance because threshold calibration depends on score distributions. The findings suggest that short selfie-capture motion traces contain measurable spoof-related and identity-related information, supporting their use as a low-friction auxiliary signal while also identifying the need for cross-device, cross-session, and real injection-attack evaluation.

Key findings

For spoof screening, ROCKAD achieved 0.00% FRR with 43.8% overall FAR, effectively rejecting all stationary attack proxies. In classification-based user verification, WEASEL+MUSE demonstrated strong performance with a 1.07% Equal Error Rate (EER) using 9 sensor channels. The analysis revealed that raw accelerometer data, preserving gravity and orientation cues, is the most informative modality, and that score calibration is crucial for verification performance beyond closed-set classification accuracy.

Approach

The researchers collected a novel multi-sensor dataset, CanSelfie, comprising bona fide selfie capture sequences and various attack-proxy scenarios (stationary, handheld, and temporally shifted). They benchmarked 7 multivariate time-series classifiers and 8 whole-series anomaly detectors across different sensor configurations and temporal windows. The evaluation focused on genuine-versus-spoof screening, 10-shot one-class user verification, and classification-based user verification using only motion data.

Datasets

CanSelfie

Model(s)

Time Series Classifiers: MR-HYDRA, WEASEL+MUSE, QUANT, r-STSF, RDST, ResNet, catch22. Whole-Series Anomaly Detectors: ROCKAD, Isolation Forest (on raw series), Isolation Forest (on QUANT features), one-class SVM (RBF), Euclidean k-NN, DTW k-NN, QUANT-based k-NN, LSTM autoencoder.

Author countries

Finland

← Previous