The "vocal filter" object, from the laboratory to the clinic: towards the anthropotechnics of our social cognitions
Nadia GUEROUAOU, a guest member of the STMS Perception and Sound Design team (Ircam, Sorbonne University, CNRS, French Ministry of Culture), is a doctoral student in Neuroscience (Ecole Doctorale de Biologie Santé, Université de Lille), with a thesis entitled "L'objet " filtre vocal ", du laboratoire à la clinique: vers l'anthropotechnie de nos cognitions sociales" (The "vocal filter" object, from the laboratory to the clinic: towards the anthropotechnics of our social cognitions) under the supervision of Jean-Julien Aucouturier (Institut FEMTO-ST, Besançon) and Guillaume Vaiva (Lille Neuroscience & Cognition Centre ). Nadia GUEROUAOU's research was funded by the Lille CHRU and the ANR REFLETS project. She has worked in collaboration with the Plasticity and Subjectivity team (Lille Neuroscience & Cognition Centre lab/INSERM/CHRU de Lille) and Neuroteam-FEMTO Université de Franche-Comté, SUPMICROTECH, CNRS, Institut FEMTO-ST Besançon.
She also works as a psychologist at the Consultation Régionale Psychotrauma des Hauts-de-France, where she treats patients suffering from Post Traumatic Stress Disorder (PTSD).
She defended her thesis in French at Ircam on January 26, 2024 at 2pm before the Jury composed of :
- Nicolas Baumard - Institut Jean-Nicod, Paris - Rapporteur
- Baptiste Caramiaux - Institut ISIR, Paris - Rapporteur
- Anahita Basirat - SCALab, Université de Lille - Examiner
- Mathieu Triclot - FEMTO-ST Institute, Belfort - Examiner
- Mélanie Voyer - CeRCA, Poitiers - Guest
- Jean-Julien Aucouturier - Institut FEMTO-ST, Besançon - Thesis co-director
- Guillaume Vaiva - Lille Neuroscience & Cognition Centre - Thesis co-director
If you wish to listen to her (again): https://medias.ircam.fr/x156933_lobjet-filtre-vocal-du-laboratoire-a
Between zoom calls and deepfakes, we now live in a world marked by the increasing digitization of our social interactions, and where we are increasingly confronted with the possibility of artificially controlling our visual and aural appearance during them. This thesis examines the effect of such transformative technologies - specifically here, of "voice filters1 " capable of controlling the expressivity of our voices - on the cognitive processes underlying our perceptions during emotional interactions.
We place ourselves in a dual theoretical and clinical framework: on the one hand, we place our question within the theory of predictive processing, and question the effect of arbitrarily controlling associations between emotional states and expressive cues (eg. I'm happy, my voice is smiling) that were hitherto taken for granted; on the other hand, we take as our starting point a particular clinical situation, that of imaginative exposure therapy to the traumatic event in patients suffering from Post-Traumatic Stress Disorder (PTSD), a situation with intense emotional content in which the patient's voice comes to the fore.
Our three patient and laboratory studies show that the pitch of the voice carries information about a patient's psychological state, as well as about an individual's heart rate (HR) - information that can be deduced from listening to the voice alone. By artificially manipulating the pitch of voice recordings, we demonstrated in two additional experiments that it was possible to orientate the perceptual judgement of individuals (caregivers and healthy participants respectively) regarding these two pieces of information conveyed by the voice, to the point of completely reversing their inferences and thus misleading them.
Taken as a whole, and in the light of data from our experimental ethics work showing a high level of moral acceptability of these technologies for transforming vocal emotions by the population as early as 2020 (the start of the acceleration of the digitization of our interactions), the results of these five studies confirm the influence of the "vocal pitch filter" on the perceptual inference processes underlying our fine-grained social cognition in interaction situations. We therefore discuss the anthropotechnical potential of this new technological object, and the need for further study of the effects of new self-shaping technologies on cognition, with a view to developing critical thinking commensurate with the philosophical, scientific and technological challenges posed by these new technologies.
1 This kind of prowess relies on a variety of voice processing techniques. However, in order to emphasize their proximity to visual objects that are already widely deployed and known in our society, we will use the term "filter" in this work to refer to these realistic voice transformations or voice deepfakes, even if their implementation does not necessarily fall under the concept of "filtering" in signal processing.