Thesis defence: Daniel BEDOYA

Capturing Musical Prosody Through Interactive Audio/Visual Annotations

  • Research
  • these

Daniel Bedoya, a doctoral student at Sorbonne University and beneficiary of an EDITE contract, completed his thesis on "Capturing musical prosody through interactive audiovisual annotations" within the Musical Representations team of the STMS Laboratory (Ircam-Sorbonne University-CNRS-Ministère de la Culture). His thesis was funded by the ERC project COSMOS (Computational Shaping and Modeling of Musical Structures) within the STMS Laboratory and by the ATER contract at the CNAM, LMSSC Laboratory.
The defence will take place at Ircam, in English, before a jury comprising :

  • Roberto Bresin, KTH Royal Institute of Technology (Sweden), Rapporteur
  • Pierre Couprie, University of Paris-Saclay, Rapporteur
  • Jean-Julien Aucouturier, FEMTO-ST, Examiner
  • Louis Bigo, Université de Lille, Examiner
  • Muki Haklay, University College London, United Kingdom, Examiner
  • Anja Volk, University of Utrecht, Netherlands, Examiner
  • Carlos Agón, Sorbonne University, Director
  • Elaine Chew, King's College London, United Kingdom, Co-Director

You can also follow it live on Ircam's Youtube channel:


Participatory science (PS) projects have stimulated research in several disciplines in recent years. Citizen scientists contribute to this research by performing cognitive tasks, fostering learning, innovation and inclusion. Although crowdsourcing has been used to collect structural annotations in music, MS remains underused to study musical expressivity.
We introduce a new annotation protocol to capture musical prosody, associated with the acoustic variations introduced by performers to make music expressive. Our top-down, human-centred method prioritises the listener in producing annotations of the prosodic features of music. We focus on segmentation and prominence, which convey structure and affect. This protocol provides an MS framework and an experimental approach for conducting systematic and scalable studies.
We implement this annotation protocol in CosmoNote, a customizable web-based software designed to facilitate the annotation of expressive musical structures. CosmoNote allows users to interact with visual layers, including the waveform, recorded notes, extracted audio attributes and score features. We can place borders of different levels, regions, comments and groups of notes.
We have conducted two studies to improve the protocol and the platform. The first examines the impact of simultaneous auditory and visual stimuli on segmentation boundaries. We compare the differences in boundary distributions derived from intermodal (auditory and visual) and unimodal (auditory or visual) information. The distances between the unimodal-visual and intermodal distributions are smaller than between the unimodal-auditory and intermodal distributions. We show that the addition of visuals accentuates key information and provides cognitive scaffolding that helps to clearly mark prosodic boundaries, although they may distract attention from specific structures. Conversely, without audio, the annotation task becomes difficult, masking subtle cues. Despite their exaggeration or inaccuracy, visual cues are essential for guiding border annotations in interpretation, improving overall results.
The second study uses all types of CosmoNote annotations and analyses how participants annotate musical prosody, with minimal or detailed instructions, in a free annotation setting. We compare the quality of annotations between musicians and non-musicians. We evaluate the MS component in an ecological setting where participants are completely autonomous in a task where time, attention and patience are valued. We present three methods based on annotation labels, categories and common properties to analyse and aggregate the data. The results show convergence in the types of annotations and descriptions used to mark recurrent musical elements, for any experimental condition and musical ability. We propose strategies for improving the protocol, data aggregation and analysis in large-scale applications.
This thesis enriches the representation and understanding of structures in interpreted music by introducing an annotation protocol and platform, adaptable experiments, and aggregation and analysis methods. We demonstrate the importance of the trade-off between obtaining data that is simpler to analyse and richer content that captures complex musical thinking. Our protocol can be generalised to studies of performance decisions to improve understanding of expressive choices in musical performance.

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour nous permettre de mesurer l'audience, et pour vous permettre de partager du contenu via les boutons de partage de réseaux sociaux. En savoir plus.