Introduction to two ongoing thesis — Yann TEYTAUT and Clément LE MOINE-VEILLON, Ph.D. candidates in the Sound Analysis/Synthesis team, STMS (Ircam, Sorbonne Université, CNRS, Ministère de la Culture) present their works as well as a collaborative study:

This seminar will be in the Stravinsky room

1) Automatic phoneme-to-audio alignment [Yann TEYTAUT]

Listening, responding, coordinating, following, adapting, synchronizing, aligning... The vocabulary of musical performance — as well as any oral conversation — is highly correlated with its temporal structure.
As a result, speech and singing analysis heavily depends on our capacity to clarify precisely when one event is occurring. To do so, this work deals with developing models for the automatic alignment (or temporal synchronization) of voice signals. Our attention and dedication is focused on phoneme-to-audio alignment, which is on one hand, a challenging task due to the temporal precision required, and on the other hand, leading to prolific applications (e.g., sound synthesis, singing style study).

2) Conversion des attitudes vocales [Clément LE MOINE-VEILLON]

Humans have an outstanding ability to communicate social signals, notably their attitudes. Enabling machines to understand, reproduce and interpret these signals is a crucial issue. This research aims to develop a system dedicated to the conversion of speech attitudes. We also intend to validate this system using objective criteria — assessing its ability to reproduce production strategies of the speech attitudes — and subjective criteria — evaluating the individual perception of yielded converted utterances.

3) Production strategies of vocal attitudes [Yann & Clément]

This seminar is concluded by a concrete application: the phonetic alignment of an expressive voice dataset recorded at Ircam — “Att-HACK”. By means of speech analysis methods coupled with the temporal synchronizations, the production strategies of diverse vocal attitudes are highlighted.

