Sparse deepfake detection promotes better disentanglement

Authors: Antoine Teissier, Marie Tahon, Nicolas Dugué, Aghilas Sini

Published: 2025-10-07 09:03:39+00:00

AI Summary

The paper introduces a novel method for deepfake detection by applying a TopK activation layer to the embeddings of the AASIST architecture to enforce sparsity. This approach demonstrates that sparse latent representations not only improve detection performance but also enhance the interpretability of the model by promoting better disentanglement. The study evaluates performance and disentanglement metrics using mutual information based on known attack types.

Abstract

Due to the rapid progress of speech synthesis, deepfake detection has become a major concern in the speech processing community. Because it is a critical task, systems must not only be efficient and robust, but also provide interpretable explanations. Among the different approaches for explainability, we focus on the interpretation of latent representations. In such paper, we focus on the last layer of embeddings of AASIST, a deepfake detection architecture. We use a TopK activation inspired by SAEs on this layer to obtain sparse representations which are used in the decision process. We demonstrate that sparse deepfake detection can improve detection performance, with an EER of 23.36% on ASVSpoof5 test set, with 95% of sparsity. We then show that these representations provide better disentanglement, using completeness and modularity metrics based on mutual information. Notably, some attacks are directly encoded in the latent space.


Key findings
The sparse deepfake detection model achieved improved performance, obtaining the best result with an EER of 23.36% on the ASVSpoof5 test set when utilizing 95% sparsity (k=20, D=320). TopK activation significantly promotes better disentanglement, improving both completeness and modularity scores compared to the baseline. Analysis confirmed that dimensions with high modularity directly encode information related to unique deepfake attacks.
Approach
The authors modify the AASIST deepfake detection network by integrating a TopK activation function into the last hidden layer to enforce controlled sparsity. This results in sparse representations that are then used for classification. Disentanglement is evaluated using completeness and modularity metrics based on the mutual information between the sparse dimensions and known deepfake attack factors.
Datasets
ASVspoof5 (ASVspoof 2024), MLS
Model(s)
AASIST (modified with TopK activation)
Author countries
France