MLAAD-tiny

Welcome to MLAAD-tiny

MLAAD-tiny is a very small subset of the full MLAAD dataset, designed for education, prototyping, and debugging.

Many teaching environments (e.g. Colab, Kaggle, university notebooks) impose strict storage limits, which makes large-scale audio deepfake datasets impractical to use. To address this, we provide MLAAD-tiny, a compact yet representative version of MLAAD.

Dataset composition

Bona-fide

Source: M-AILABS
~6,000 audio files
~1.9 GB
Language: English

Spoof

64 TTS systems
100 samples per system (randomly selected from MLAAD)
~6,400 audio files
~2.3 GB
Languages: English (for training) and German (for testing)

Downloads

Dataset available via Hugging Face Datasets

License

Bona-fide audio is redistributed from M-AILABS under its original license.
Spoofed audio is redistributed under the MLAAD v8 license (CC BY-NC 4.0).

See the Huggingface repository for full license information.