Welcome to MLAAD-tiny

MLAAD-tiny is a very small subset of the full MLAAD dataset, designed for education, prototyping, and debugging.

Many teaching environments (e.g. Colab, Kaggle, university notebooks) impose strict storage limits, which makes large-scale audio deepfake datasets impractical to use. To address this, we provide MLAAD-tiny, a compact yet representative version of MLAAD.

Dataset composition
Bona-fide
  • Source: M-AILABS
  • ~6,000 audio files
  • ~1.9 GB
  • Language: English
Spoof
  • 64 TTS systems
  • 100 samples per system (randomly selected from MLAAD)
  • ~6,400 audio files
  • ~2.3 GB
  • Languages: English (for training) and German (for testing)
Downloads
License
See the Huggingface repository for full license information.