Thesis defence : Lenny RENAULT

Neural Audio Synthesis of Realistic Piano Performances

  • Research
  • these

Lenny Renault, a doctoral student at Sorbonne University, completed his thesis entitled "Neural Audio Synthesis of Realistic Piano Performances" at the STMS laboratory (Ircam - Sorbonne University - CNRS - Ministry of Culture), as part of the Sound Analysis and Synthesis team, under the supervision of Axel Roebel, head of the team, and the co-supervision of Rémi Mignot, researcher.

His thesis was funded by the European Horizon 2020 project n°951911 - AI4Media.

The defence will be in English, and will take place in the Petite Salle at the Centre Pompidou, Paris, on Monday 8 July 2024 at 3:00pm. It will be broadcast live on

The public is asked to enter via the plaza of the Centre Pompidou, at the "File rouge" entrance.

The jury will be made up of:

  • Mark Sandler, Queen Mary University of London, Rapporteur
  • Mathieu Lagrange, CNRS, Laboratoire des Sciences du Numérique de Nantes (LS2N), Rapporteur
  • Gaël Richard, Laboratoire Traitement et Communication de l'Information (LTCI) - Télécom Paris, Examiner
  • Jesse Engel, Google DeepMind, Examiner
  • Juliette Chabassier, Modartt, Examiner
  • Axel Roebel, STMS Lab, Thesis supervisor

Abstract: Musician and instrument make up a central duo in the musical experience. Inseparable, they are the key actors of the musical performance, transforming a composition into an emotional auditory experience. To this end, the instrument is a sound device, that the musician controls to transcribe and share their understanding of a musical work. Access to the sound of such instruments, often the result of advanced craftsmanship, and to the mastery of playing them, can require extensive resources that limit the creative exploration of composers.

This thesis explores the use of deep neural networks to reproduce the subtleties introduced by the musician's playing and the sound of the instrument, making the music realistic and alive. Focusing on piano music, the conducted work has led to a sound synthesis model for the piano, as well as an expressive performance rendering model.

DDSP-Piano, the piano synthesis model, is built upon the hybrid approach of Differentiable Digital Signal Processing (DDSP), which enables the inclusion of traditional signal processing tools into a deep learning model. The model takes symbolic performances as input and explicitly includes instrument-specific knowledge, such as inharmonicity, tuning, and polyphony. This modular, lightweight, and interpretable approach synthesizes sounds of realistic quality while separating the various components that make up the piano sound.

As for the performance rendering model, the proposed approach enables the transformation of MIDI compositions into symbolic expressive interpretations.
In particular, thanks to an unsupervised adversarial training, it stands out from previous works by not relying on aligned score-performance training pairs to reproduce expressive qualities.

The combination of the sound synthesis and performance rendering models would enable the synthesis of expressive audio interpretations of scores, while enabling modification of the generated interpretations in the symbolic domain.

Thèse Lenny Renault

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour nous permettre de mesurer l'audience, et pour vous permettre de partager du contenu via les boutons de partage de réseaux sociaux. En savoir plus.