MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Authors: Zihan Pan, Sailor Hardik Bhupendra, Jinyang Wu

Published: 2025-09-11 06:18:29+00:00

AI Summary

This paper proposes MoLEx (Mixture of LoRA Experts), a parameter-efficient framework for audio deepfake detection that combines Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) router. MoLEx efficiently finetunes only selected experts of pre-trained Self-Supervised Learning (SSL) models, preserving core knowledge while reducing computational costs. Evaluated on the ASVSpoof 5 dataset, MoLEx achieves a state-of-the-art Equal Error Rate (EER) of 5.56% on the evaluation set without augmentation.

Abstract

While self-supervised learning (SSL)-based models have boosted audio deepfake detection accuracy, fully finetuning them is computationally expensive. To address this, we propose a parameter-efficient framework that combines Low-Rank Adaptation with a Mixture-of-Experts router, called Mixture of LoRA Experts (MoLEx). It preserves pre-trained knowledge of SSL models while efficiently finetuning only selected experts, reducing training costs while maintaining robust performance. The observed utility of experts during inference shows the router reactivates the same experts for similar attacks but switches to other experts for novel spoofs, confirming MoLEx's domain-aware adaptability. MoLEx additionally offers flexibility for domain adaptation by allowing extra experts to be trained without modifying the entire model. We mainly evaluate our approach on the ASVSpoof 5 dataset and achieve the state-of-the-art (SOTA) equal error rate (EER) of 5.56% on the evaluation set without augmentation.


Key findings
MoLEx achieved a state-of-the-art EER of 5.56% on the ASVSpoof 5 evaluation set, significantly outperforming non-MoLEx baselines and competitive with data-augmented finetuned models. The proposed orthogonality regularization loss critically enhanced expert expressiveness and overall performance by ensuring effective rank utilization. Expert utilization analysis demonstrated that MoLEx exhibits domain-aware adaptability, reactivating similar experts for known attacks and switching to others for novel or out-of-domain spoofs, showcasing its robustness and flexibility for domain adaptation.
Approach
MoLEx addresses the high computational cost of finetuning large SSL models by integrating Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) router within the transformer layers of a pre-trained SSL model (WavLM). It employs multiple LoRA adapters as experts, with a gating network dynamically selecting a subset of these experts for activation, along with an orthogonality regularization loss to enhance expert expressiveness. The framework also allows for efficient domain adaptation by adding and training new experts without modifying the entire model.
Datasets
ASVSpoof 5, ASVSpoof 2019, ASVSpoof 2021LA, ASVSpoof 2021DF, DFADD, FakeOrREal, In the Wild, LibriSeVoc
Model(s)
WavLM (large model as backbone), LoRA adapters, Mixture-of-Experts (MoE) router/gating network, Attentive Merging Mechanism, LSTM layer, Fully-connected layer
Author countries
Singapore