The Secret Sauce Datasets

The aim of the Secret Sauce project is to automatically reverse engineer studio sounds, with a focus on guitar effects and synthesizers. Ultimately, we wish to answer the question Wow, how did they make that sound?!. This post presents the task, the datasets, and makes them available for whoever would like to try.

The Goal

Whether it’s a light echo on a rockabilly guitar, a thick pad on a prog rock record or a heavy sub on a hip hop track, electronic effects and instruments have become a crucial aspect of modern music. And yet, as Web forums testify, turning ideas into sounds is a difficult craft, which takes years to master.

The aim of the Secret Sauce project is to make it easier to create sounds in the studio. Currently, tuning machines is a difficult process, which relies heavily on trial and error; my neighbors will probably never forgive me for the hours spent trying to sound like the records that I enjoy. Wouldn’t it be great if musicians could simply provide an example of what they want, from a record or real life, and the machines would automatically “get it” and tune themselves?

To this end, we are developing statistical models to reverse-engineer guitar and synthesizer sounds — give the model a reference sound, and it will tell you how to reproduce it.

The first step is to collect data. Many sound banks exist, but little to none of them give the detail of how the sounds were produced. This post is our attempt to fill this gap. We are releasing three datasets, based on three instruments: a guitar with 5 effects, a substractive synth plug-in and a Moog Sub Phatty. For each instrument, we provide 10,000 sound samples based on the same note, as well as the settings that we used to produce them. The inference task is to guess the settings from the sounds.

Datasets

For each dataset, we chose one note (C3), one instrument and one set of parameters. To generate the sounds, we assigned random values to the parameters, sent them through MIDI and recorded the note for a bit less than a second. The code to generate the tracks and clean the data is available on Github.

Guitar Effects

guitar-rig The Guitar Effect dataset is based on a guitar simulator (Ableton Tension) routed to an amp simulator, a distortion, a delay, a chorus/flanger and a reverb. All the effects come from Guitar Rig 5 Player edition, which is free.

The dataset comes in two flavors. The tiny version contains 32 sounds which correspond to every subset of effects (5 binary variables). The full version contains 10,000 samples; we varied both the effects (5 binary variables) and their settings (14 continuous variables).

TAL-NoiseMaker

noisemaker The Noisemaker dataset is based on TAL’s free plugin, which emulates a substractive synthesizer with two oscillators, a sub, two LFOs and separate envelopes for the filter and the amp.

The light version contains 1,000 samples,for which we varied 1 switch and 4 knobs. The full version contains 10,000 samples, we varied 3 switches and 13 knobs.

Moog Sub Phatty

noisemaker The dataset is based on a Moog Sub Phatty, an analog synth with two oscillators and a sub, and a very aggressive sound. In contrast to the Noisemaker dataset, we varied almost every possible parameter on the front panel, leading to extreme (and sometimes silent) sounds.

The dataset contains 10,000 samples, obtained by tweaking 24 knobs and 5 switches. You may obtain it here.

Details

We sampled the parameter values uniformly over their entire domain (0-127 or 0-16383 based on MIDI specs) and rescale them to the range 0-10.

The samples last between 840ms and 900ms, they are mono encoded in 16 bits/22,050Hz. We did not normalize the volume, but we did use a limiter to avoid saturation. For more information, see the README files in the archives.

What’s next?

The first objective is to go through the tasks with standard ML techniques — I will present the first results in a separate blog post in the near future.

The next step is to vary the melodic patterns. Currently, we always play the same note, which greatly simplifies the inference. But what happens when we start playing melodies and chords? I suspect that much bigger datasets will be needed to learn robust feature representations.

Finally, in the long term, we can imagine reversing the process, and infer sounds from the settings in the spirit of Nsynth, or, perhaps, learn to imitate effects as a style transfer task.

References

The Secret Sauce project is in the lineage of Yee King’s pioneering thesis (2011), which showed that it was possible to program substractive and additive synthesizers with feed-forward nets and genetic algorithms to imitate sounds. We hope to reproduce those results, generalize them to other settings and try many other models. If you know of any other relevant reference, please reach out!

Acknowledgements

Many thanks to Léo Sellam for the synthesizer, the recording, the advise and the food. I also thank Adrien Durand for his insights.