Virtual camera detection: Catching video injection attacks in remote biometric systems

Authors: Daniyar Kurmankhojayev, Andrei Shadrikov, Dmitrii Gordin, Mikhail Shkorin, Danijar Gabdullin, Aigerim Kambetbayeva, Kanat Kuatov

Published: 2025-12-11 14:01:06+00:00

AI Summary

This study introduces a machine learning-based approach for Virtual Camera Detection (VCD) to counter video injection attacks, such as deepfakes, in remote facial biometric systems. The approach trains models exclusively on system metadata collected from camera interactions during authentication sessions. Empirical results demonstrate the method's effectiveness in identifying virtual camera usage, thereby significantly reducing the risk of malicious users bypassing Face Anti-Spoofing (FAS) systems.

Abstract

Face anti-spoofing (FAS) is a vital component of remote biometric authentication systems based on facial recognition, increasingly used across web-based applications. Among emerging threats, video injection attacks -- facilitated by technologies such as deepfakes and virtual camera software -- pose significant challenges to system integrity. While virtual camera detection (VCD) has shown potential as a countermeasure, existing literature offers limited insight into its practical implementation and evaluation. This study introduces a machine learning-based approach to VCD, with a focus on its design and validation. The model is trained on metadata collected during sessions with authentic users. Empirical results demonstrate its effectiveness in identifying video injection attempts and reducing the risk of malicious users bypassing FAS systems.


Key findings
All trained models achieved strong discriminative performance, with AUC-ROC scores above 0.9. At a moderate security threshold (APCER of 10^-1), the system showed a balanced trade-off with good usability (BPCER of 14.6%). However, achieving maximum security (APCER of 10^-3) resulted in severe usability degradation, with a Bona Fide Presentation Classification Error Rate (BPCER) reaching 91.7%.
Approach
The detection system employs machine learning models trained on metadata collected by systematically challenging the camera API (Application Programming Interface). These challenges involve adjusting frame height and frame rate, and recording system response metrics such as reported values, actual applied values, and response times, exploiting behavioral differences between physical and virtual camera drivers.
Datasets
A proprietary, real-world dataset comprising over 30,000 bonafide sessions and 2,812 attack sessions, collected across various platforms (Android, iOS, Linux, MacIntel, Win32) and browsers (Chrome, Firefox).
Model(s)
Histogram Gradient Boosting (HGB), Categorical Boosting (CatBoost), and an Ensemble model combining both.
Author countries
Kazakhstan