Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection

Authors: Unisha Joshi

Published: 2025-08-06 05:18:01+00:00

Comment: 11 pages, 4 figures, and 7 tables

AI Summary

This paper introduces an age-diverse deepfake dataset to address demographic bias in deepfake detection, specifically concerning age groups. The dataset is built by integrating existing deepfake datasets with synthetic data generated to fill age distribution gaps. Evaluation shows that models trained on this new dataset achieve fairer performance across age groups, improved overall accuracy, and higher generalization capabilities.

Abstract

The challenges associated with deepfake detection are increasing significantly with the latest advancements in technology and the growing popularity of deepfake videos and images. Despite the presence of numerous detection models, demographic bias in the deepfake dataset remains largely unaddressed. This paper focuses on the mitigation of age-specific bias in the deepfake dataset by introducing an age-diverse deepfake dataset that will improve fairness across age groups. The dataset is constructed through a modular pipeline incorporating the existing deepfake datasets Celeb-DF, FaceForensics++, and UTKFace datasets, and the creation of synthetic data to fill the age distribution gaps. The effectiveness and generalizability of this dataset are evaluated using three deepfake detection models: XceptionNet, EfficientNet, and LipForensics. Evaluation metrics, including AUC, pAUC, and EER, revealed that models trained on the age-diverse dataset demonstrated fairer performance across age groups, improved overall accuracy, and higher generalization across datasets. This study contributes a reproducible, fairness-aware deepfake dataset and model pipeline that can serve as a foundation for future research in fairer deepfake detection. The complete dataset and implementation code are available at https://github.com/unishajoshi/age-diverse-deepfake-detection.


Key findings
Models trained on the age-diverse dataset achieved higher overall accuracy, demonstrated fairer performance across various age groups, and showed superior generalization capabilities to unseen datasets. In contrast, models trained solely on original, age-skewed datasets exhibited overfitting and poor generalization to other datasets.
Approach
The authors address age-specific bias in deepfake detection by constructing a new age-diverse deepfake dataset. This is achieved by combining existing datasets like Celeb-DF, FaceForensics++, and UTKFace, and generating synthetic deepfake videos for underrepresented age groups using SimSwap and InsightFace. The new dataset is then used to train and evaluate deepfake detection models.
Datasets
Celeb-DF, FaceForensics++, UTKFace
Model(s)
XceptionNet, EfficientNet-B0, LipForensics
Author countries
USA