Atonal Data

We're a data provider for large-scale symbolic music datasets.

If you have an audio-domain dataset, we can transcribe it into a symbolic domain dataset using the tokenization method of your choosing, or supply you with the raw MIDI data represented in the audio. We work with monophonic, polyphonic, and percussive source material.

Open Source Datasets

Symbolic Jazz Standards

A dataset of public-domain jazz standards that have been transcribed into the symbolic domain stem by stem, representing 10,000 minutes of audio.

(opens in a new tab)Download it from the HuggingFace hub (opens in a new tab)

If you're interested in working with us, please reach out at hello@atonaldata.com

© Atonal Data.