Generative Adversarial Networks for Synthesis and Control of Drum Sounds
Antoine Lavault, Research Engineer at Apeira Technologies, has completed a CIFRE thesis entitled "Generative Adversarial Networks for Synthesis and Control of Drum Sounds", under the supervision of Axel Roebel, Head of the Sound Analysis and Synthesis team at the STMS laboratory (UMR 9912 - Ircam - Sorbonne University - CNRS - French Ministry of Culture).
He will present his thesis at Ircam on December 8, 2023 at 2:30 p.m. in English. It will also be possible to follow the defense live on Ircam's Youtube channel: https://youtube.com/live/JaFBLmW51jU
- Prof. Philippe Depalle - McGill University (Canada) - Rapporteur
- Prof. Vesa Välimäki - Aalto University (Finland) - Rapporteur
- Prof. Slim Essid - LTCI - Télécom Paris - Institut Polytechnique de Paris - Examiner
- Dr. Sølvi Ystad - Laboratoire Prism, Université Aix-Marseille - Examiner
- Dr. Strefan Lattner - Sony Computer Science Laboratories, Paris - Examiner
- Dr. Axel Roebel - Research Director (HDR) - Ircam, STMS Lab - Thesis Director
Audio synthesizers are electronic systems capable of generating artificial sounds under a set of parameters dependent on their architecture. Despite various developments transforming synthesizers from mere sonic curiosities in the 1960s and earlier to primary instruments in modern music production, two major challenges remain: developing a synthesis system that aligns with human perception and designing a universal synthesis method capable of modeling any source and surpassing it within an artistic process.
This thesis explores the use and application of Generative Adversarial Networks (GANs) to address the aforementioned challenges. The main objective is to propose a neural synthesizer capable of generating realistic drum sounds and controllable through a set of predefined timbre parameters, as well as offering velocity control of the synthesis.
The initial step in the project involved introducing a GAN-based approach to generate realistic drum sounds. In addition to this neural synthesis method, we incorporated timbre control capabilities by exploring a different path from existing solutions: the use of differentiable descriptors. To provide experimental validation, we conducted evaluation experiments using both statistics-based objective metrics and subjective and psychophysical assessments of perceived quality and control error perception. To offer a synthesizer suitable for musical performances, we have also added dynamic control to the synthesizer control through a new dataset we created for this purpose. The explicit goal was to create a comprehensive foundation of sounds applicable in the vast majority of conditions encountered in the context of music production. From this dataset, we present experimental results related to dynamic control, a key aspect of musical performance.