Welcome to MLAAD-tiny
MLAAD-tiny is a very small subset of the full
MLAAD dataset, designed for education,
prototyping, and debugging.
Many teaching environments (e.g. Colab, Kaggle, university notebooks)
impose strict storage limits, which makes large-scale audio deepfake
datasets impractical to use. To address this, we provide
MLAAD-tiny, a compact yet representative version of
MLAAD.
Dataset composition
Bona-fide
- Source: M-AILABS
- ~6,000 audio files
- ~1.9 GB
- Language: English
Spoof
- 64 TTS systems
-
100 samples per system (randomly selected from
MLAAD)
- ~6,400 audio files
- ~2.3 GB
-
Languages: English (for training) and German
(for testing)
Downloads
License
-
Bona-fide audio is redistributed from M-AILABS under its original
license.
-
Spoofed audio is redistributed under the MLAAD v8 license
(CC BY-NC 4.0).
See the Huggingface repository for full license information.