Learnable MIDI Synthesis

This web repository contains musical examples from our paper Towards realistic MIDI instrument synthesizers. Our paper proposes a system that learns to synthesize instrument audio from MIDI by building on the DDSP method proposed by Engel et al. (https://arxiv.org/abs/2001.04643). Specifically, we introduce a module called MIDI2Params, which augments the DDSP model by allowing MIDI controllability. Code can be found here.

We display some samples comparing our proposed system DDSP(midi2params_{aligned}(MIDI)) to other "basslines." Below is a list of all systems we compared:

Human Recordings: The gold standard: the original human recording of the test piece.
DDSP(MIDI2Params_{transcribed}(MIDI)): Our MIDI2Params system, which was trained on MIDI that was transcribed directly from the human recordings.
DDSP(MIDI2Params_{aligned}(MIDI)): Our MIDI2Params system, which was trained on MIDI that was aligned from pre-existing sheet music onto the human recordings.
DDSP: The upper bound of realism that our MIDI2Params model could conceivably reach. For this baseline, we run the audio parameters extracted from the human recording directly into the DDSP model trained on our violin dataset.
Concatenative (freely-available): The "low bar" for instrument synthesis. This baseline is from afreely-available concatenative synthesizer called FluidSynth.
Concatenative (commercially-available): The "state-of-the-art" for instrument synthesis; what a modern professional musician would employ to get the best sound short of hiring a real musician.
DDSP(Heuristic(MIDI)): The "low bar" for our MIDI2Params model. This baseline uses a heuristic to convert MIDI into DDSP-style audio parameters.

Musical examples from Towards realistic MIDI instrument synthesizers

Rodrigo Castellon, Chris Donahue, Percy Liang

Clip #	Human Recordings	DDSP(midi2params_{transcribed}(MIDI))	DDSP(midi2params_{aligned}(MIDI))	DDSP	Concatenative (freely-available)	Concatenative (commercially-available)	DDSP(Heuristic(MIDI))
1
2
3
4
5