Modeling Biomechanical Constraint Violations for Language-Agnostic Lip-Sync Deepfake Detection
Authors: Hao Chen, Junnan Xu
Published: 2026-04-18 03:32:40+00:00
Comment: 8 pages, 4 figures. Keywords: deepfake detection, lip-sync forgery, biomechanical constraints, temporal kinematics, cross-lingual generalization, privacy-preserving detection, geometric features
AI Summary
This paper introduces BioLip, a lightweight framework for language-agnostic lip-sync deepfake detection. It operates by identifying violations of biomechanical constraints in synthetic videos, specifically an elevated temporal lip variance termed 'temporal lip jitter', which is consistent across language, ethnicity, and recording conditions. The framework processes 64 perioral landmark coordinates to detect these physics-grounded anomalies.
Abstract
Current lip-sync deepfake detectors rely on pixel-level artifacts or audio-visual correspondence, failing to generalize across languages because these cues encode data-dependent patterns rather than universal physical laws. We identify a more fundamental principle: generative models do not enforce the biomechanical constraints of authentic orofacial articulation, producing measurably elevated temporal lip variance -- a signal we term temporal lip jitter -- that is empirically consistent across the speaker's language, ethnicity, and recording conditions. We instantiate this principle through BioLip, a lightweight framework operating on 64 perioral landmark coordinates extracted by MediaPipe.