SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

Authors: Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid

Published: 2026-04-04 19:19:33+00:00

AI Summary

This paper introduces SPARK-IL, a retrieval-augmented framework designed to enhance deepfake detection by improving generalization to unseen generative models. It combines dual-path spectral analysis with incremental learning, leveraging consistent frequency-domain signatures to identify AI-generated images. SPARK-IL achieves a state-of-the-art 94.6% mean accuracy on the UniversalFakeDetect benchmark across 19 diverse generative models.

Abstract

Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual connections. During inference, this fused embedding retrieves the $k$ nearest labeled signatures from a Milvus database using cosine similarity to facilitate predictions via majority voting, while an incremental learning strategy expands the database and employs elastic weight consolidation to preserve previously learned transformations. Evaluated on the UniversalFakeDetect benchmark across 19 generative models -- including GANs, face-swapping, and diffusion methods -- SPARK-IL achieves a 94.6\\% mean accuracy, with the code to be publicly released at https://github.com/HessenUPHF/SPARK-IL.


Key findings
SPARK-IL achieved a 94.6% mean accuracy on the UniversalFakeDetect benchmark, outperforming existing state-of-the-art methods, including REVEAL by 0.7%. The dual spectral analysis and retrieval-augmented inference were found to be the most significant contributors to its performance gains, demonstrating strong generalization capabilities to diverse and unseen generative models. The incremental learning strategy also proved effective in adapting to new generators without performance degradation on previously learned techniques.
Approach
SPARK-IL employs a dual-path architecture where a partially frozen ViT-L/14 extracts semantic features, and a parallel path processes raw RGB pixels. Both paths undergo multi-band Fourier decomposition, with each frequency band processed by Kolmogorov-Arnold Networks (KAN) using mixture-of-experts, before spectral embeddings are fused via cross-attention. During inference, the fused embedding retrieves k nearest labeled signatures from a Milvus database for majority-voting classification, while incremental learning expands the database and uses elastic weight consolidation to preserve knowledge.
Datasets
UniversalFakeDetect benchmark
Model(s)
ViT-L/14 encoder, Kolmogorov-Arnold Networks (KAN), Milvus database
Author countries
France, Italy, United Arab Emirates