Théis BAZIN's thesis defense

Designing Novel Time- Frequency Scales for Interactive Music Creation with Hierarchical Statistical modelling

  • Research
  • these

Théis Bazin will defend his CIFRE thesis entitled "Conception of new scales of musical creation using hierarchical statistical learning". This thesis was carried out under the academic supervision of Dr. Mikhail MALT within the Musical Representations team of the STMS laboratory (Ircam-CNRS-Sorbonne University-Ministry of Culture) and the industrial supervision of Dr. Gaëtan Hadjeres (SonyAI) within the Sony CSL Paris laboratory. It also benefited from the co-supervision of Dr. Philippe Esling. This work was supported by the ANRT under the CIFRE grant n° 2019.009.

The defense will take place at Ircam. You can also follow it live on Ircam's Youtube channel:


Pr. Wendy MACKAY - Rapporteur - Research Director, INRIA-Saclay, ex-situ research group
Prof. Geoffroy PEETERS - Rapporteur - Professor, Image-Data-Signal (IDS) department, LTCI, Telecom Paris, Institut Polytechnique de Paris
Prof. Cheng-Zhi Anna HUANG - Examiner - Associate Professor, MILA, University of Montreal - Researcher, Google Magenta - Canada-CIFAR Chair in AI
Dr. Jean BRESSON - Examiner - Research Director, RepMus, STMS, IRCAM, Sorbonne University, CNRS - Team Product Owner, Ableton
Dr. Mikhail MALT - Supervisor - Research Fellow, RepMus, STMS, IRCAM, Sorbonne University, CNRS
Dr. Gaëtan HADJERES - Thesis co-supervisor - Senior Research Scientist, SonyAI

Summary of the thesis

Modern musical creation unfolds on many different time scales: from the vibration of a string or the resonance of an electronic instrument to the millisecond scale, through the typical few seconds of an instrument note, to the tens of minutes of operas or DJ sets. The intermingling of these multiple scales has led to the development of numerous technical and theoretical tools to make this time-manipulation enterprise effective. These abstractions, such as scales, rhythmic notations or even common models of audio synthesis, largely infuse the current tools - software and hardware - of musical creation. However, these abstractions, which emerged for the most part during the 20th century in the West on the basis of classical musical theories of written music, are not devoid of cultural a priori. They reflect certain principles aimed at erasing certain aspects of the music (e.g. micro-deviations from a metronomic beat or micro-deviations of frequency from an idealised pitch), whose high degree of physical variability typically makes them inconvenient for music writing. These compromises, which are relevant when the written music is intended for performance by musicians, able to reintroduce variation and physical and musical richness, prove limiting in the context of computer-assisted music creation, coldly rendering these abstractions, where they tend to restrict the diversity of music that can be produced. Through the presentation of several typical interfaces for music creation, I show that an essential factor is the scale of human-machine interactions offered by these abstractions. At their most flexible level, such as audio representations or piano-rolls over unquantified time, they prove difficult to manipulate, as they require a high degree of precision, particularly unsuitable for modern mobile and touch terminals. Conversely, in many commonly used abstractions, such as scores or sequencers, in discretised time, they prove to be constraining for the creation of culturally diverse music. In this thesis, I argue that artificial intelligence, through its ability to construct high-level representations of given complex objects, allows for the construction of new scales of musical creation, designed for interaction, and thus proposes radically new approaches to musical creation. I present and illustrate this idea through the design and development of three AI-assisted music creation web prototypes, one of which is based on a novel neural model for the inpainting of musical instrument sounds also designed in the framework of this thesis. These high-level representations -- for scores, piano rolls and spectrograms -- are deployed at a coarser time-frequency scale than the original data, but better suited to interaction. By allowing localised transformations to be performed on this representation, but also capturing, through statistical modelling, aesthetic specificities and micro-variations of the musical training data, these tools allow musically rich results to be obtained in an easy and controllable way. Through the evaluation of these three prototypes in real conditions by several artists, I show that these new scales of interactive creation are useful for both experts and novices. Thanks to the assistance of AI on technical aspects normally requiring precision and expertise, they are also suitable for use on touch screens and mobile devices.

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour nous permettre de mesurer l'audience, et pour vous permettre de partager du contenu via les boutons de partage de réseaux sociaux. En savoir plus.