Avancées méthodologiques pour la Réalité Augmentée Audio et ses Applications

  • Recherche
  • Séminaires

Dans le cadre du projet ANR HAIKUS (ANR-19-CE23-0023), l'IRCAM, le LORIA et le IJLRA organisent un atelier d'une journée sur les avancées méthodologiques pour la réalité augmentée audio et ses applications.

La réalité augmentée audio (RAA) consiste à intégrer des contenus sonores pré-enregistrés ou générés par ordinateur dans l'environnement réel de l'auditeur. L'audition joue un rôle essentiel pour comprendre notre environnement spatial et interagir avec celui-ci. La modalité auditive accroît l'engagement de l'utilisateur et enrichit l’expérience vécue dans les applications de réalité augmentée (RA), relevant en particulier des domaines de la création artistique, de la médiation culturelle, du divertissement et de la communication.

Les algorithmes de spatialisation sonore représentent des éléments clés dans la chaîne de traitement pour la RAA. Il s’agit de contrôler, en temps réel, la position et l’orientation des sources virtuelles et de synthétiser les effets de réverbération qui leur seront appliqués. Ces outils ont maintenant atteint un niveau de maturité et permettent de piloter des systèmes aussi divers que le rendu binaural tridimensionnel sur casque ou des réseaux de haut-parleurs massivement multicanaux. La précision du traitement spatial appliqué aux événements sonores virtuels est cependant essentielle pour assurer leur intégration sans hiatus perceptif dans l'environnement réel de l'auditeur. Pour atteindre ce niveau d'intégration et de transparence, des méthodes sont nécessaires pour identifier les propriétés acoustiques de l'environnement et ajuster les paramètres du moteur de spatialisation en conséquence. Idéalement, ces méthodes devraient permettre de déduire automatiquement les caractéristiques du canal acoustique, sur la seule base de de l’activité sonore des sources réelles présentes dans l'environnement réel (par exemple : voix, bruits, sons ambiants, sources en mouvement). Ces sujets font l'objet d'une attention croissante, en particulier à la lumière des progrès récents des approches basées sur les méthodes d’apprentissage machine dans le domaine de l'acoustique. En complément, les études perceptives permettent de définir le niveau d’exigence requis pour garantir une expérience sonore cohérente.

[N.B. Les exposés et discussions se dérouleront en anglais]

Comité d'organisation :

Antoine Deleforge (INRIA), François Ollivier (MPIA-IJLRA), Olivier Warusfel (IRCAM)

Programme provisoire (horaire et ordre des orateurs susceptible de modifications)
09h15 Accueil (café)
09h50  Introduction
10h00  Toon van Watershoot (KU Leuven)
11h00  Cagdas Tuna (Fraunhofer IIS)
11h50  pause
12h00 Antoine Deleforge (INRIA)
12h40 Brève présentation des posters(*)
12h50 déjeuner et posters(*)
14h20 François Ollivier (IJLRA - Sorbonne Université)
15h00 Annika Neidhardt (Surrey University)
15h50 pause
16h00 Sebastian Schlecht (Friedrich-Alexander-Universität)
16h50 Olivier Warusfel (IRCAM)
17h30 collation (et posters)
18h30 Clôture

(*) Les participants désireux de présenter un poster en rapport avec les thèmes de la journée sont invités à transmettre un résumé de ce travail, en cours ou récent, et de son contexte, avant le 27 novembre à l'adresse postersubmissionhaikus@ircam.fr

Intervenants :
Toon van Waterschoot (KU Leuven - B)
Title: Spatial interpolation of room acoustics models for dynamic audio rendering
Summary: The room impulse response (RIR) provides a fundamental representation of room acoustics for a spatially invariant source-observer combination. Dynamic audio rendering in extended reality (XR) applications however requires a room acoustics modelling framework that is capable of representing movements of (virtual) sources and observers. In this talk, we will highlight our recent research efforts in developing such modelling framework by introducing various methods for spatial interpolation and extrapolation of RIRs. In addition, we will present a novel and unique dataset that is specifically targeted at the training and evaluation of room acoustics models with moving observers, as well as a real-life XR case study of dynamic audio rendering.

Cagdas Tuna (Fraunhofer IIS - D)
Title: Data-driven room geometry inference using smart speakers
Summary: Knowledge of geometric properties of a room may be very beneficial for many audio applications, including sound source localization, sound reproduction, and augmented and virtual reality. Room geometry inference (RGI) deals with the problem of acoustic reflector localization based on room impulse responses recorded between loudspeakers and microphones.

Rooms with highly absorptive walls or walls at large distances from the measurement setup pose challenges for RGI methos. In the first part of the talk, we present a data-driven method to jointly detect and localize acoustic reflectors that correspond to nearby and/or reflective walls. We employ a multi-branch convolutional recurrent neural network whose input consists of a time-domain acoustic beamforming map, obtained via Radon transform from multi-channel room impulse responses. We propose a modified loss function forcing the network to pay more attention to walls that can be estimated with a small error. Simulation results show that the proposed method can detect nearby and/or reflective walls and improve the localization performance for the detected walls.

Data-driven RGI methods generally rely on simulated data since the RIR measurements in a diverse set of rooms may be a prohibitively time-consuming and labor-intensive task. In the second part of the talk, we explore regularization methods to improve RGI accuracy when deep neural networks are trained with simulated data and tested with measured data. We use a smart speaker prototype equipped with multiple microphones and directional loudspeakers for real-world RIR measurements. The results indicate that applying dropout at the network’s input layer results in improved generalization compared to using it solely in the hidden layers. Moreover, RGI using multiple directional loudspeakers leads to increased estimation accuracy when compared to the single loudspeaker case, mitigating the impact of source directivity.

Antoine Deleforge (INRIA - FR)
Title: On the Impact of Simulation Realism in Virtually-Supervised Acoustic Parameter Estimation
Summary: Estimating acoustic parameters, such as the localization of a sound source, the geometry, or the acoustical properties of an environment from audio recordings, is a crucial component of audio augmented reality systems. These tasks become especially challenging in the blind setting, e.g., when using noisy recordings of human speakers. Significant progress has been made in recent years thanks to the advent of supervised machine learning. However, these methods are often hindered by the limited availability of real-world annotated data for such tasks. A common strategy has been to use acoustic simulators to train such models, a framework we refer to as "Virtually Supervised Learning." In this talk, we will explore how the realism of simulation impacts the generalizability of virtually-supervised models to real-world data. We will focus on the tasks of sound source localization, room geometry estimation, and reverberation time estimation from noisy multichannel speech recordings. Our results suggests that enhancing the realism of the source, microphone, and wall responses during simulated training by making them frequency- and angle-dependent significantly improves generalization performance.

François Ollivier (IJLRA - Sorbonne Univ. FR)
Title: Design and applications of a High order spherical microphone array
Summary: This presentation covers the design, characteristics and implementation of a spherical microphone array using 256 Mems cells (HOSMA). This HOSMA is designed for directional analysis of room acoustics at order 15. The array uses advanced techniques to capture spatial audio with high accuracy, enabling 3D acoustic analysis and sound field decomposition in the spherical harmonics (SH) domain. Design considerations include optimal microphone placement on the spherical surface, ensuring uniform spatial sampling and minimizing aliasing effects. The characteristics of the HOSMA are evaluated using simulations and real experiments. Implementation challenges, such as calibration and signal processing, are discussed. Applications in room acoustics, such as the estimation of directional room impulse responses (DRIRs) and sound source localization, are presented. They enable us to estimate the HOSMA's potential in both research and practical scenarios. The first developments in a research project using the HOSMA for machine-learning-based DRIR interpolation are also presented.

Annika Neidhardt (Surrey University  - UK )
Title: Perceptual matching of room acoustics and perceptual optimization of room acoustic rendering for AR/XR audio
Summary: Systems for Augmented and Extended Reality (AR/XR) aim at rendering virtual content into the user’s natural environment or seemingly modify the properties of the actual environment. A future vision, for example, is to replace a person’s speech with the same text spoken in foreign language, or offering users more control over which parts of the actual environment they want to hear. Such ideas require analysing the natural acoustic environment and render the contents accordingly in the best case without any noticeable delay. Achieving high physical accuracy of the simulated content, remains challenging under these circumstances. How accurately does it need to be?
The expectations and perceptual require a lot on the specific content and application.
How can we make use of that? Is there a simple technical solution?
This presentation will discuss different technical approaches that seem very promising.

Sebastian Schlecht (Friedrich-Alexander-Universität - D)
Title: Common-slope modelling for 6DoF Audio
Summary: In spatial audio, accurately modelling sound field decay is critical for realistic 6DoF audio experiences. This talk introduces the common-slope model, a compact approach that utilizes an energetic sound field description to represent spatial energy decay smoothly and efficiently. We will explore the derivation of this model, demonstrating estimation techniques based on measured or simulated impulse responses (IRs). Particular focus will be given to applications in complex environments, such as coupled room systems, and unique phenomena like fade-in behaviour at the onset of reverberation. Additionally, we’ll discuss how common-slope parameters can be directly derived from room acoustic geometry using acoustic radiance transfer, offering insights into practical implementations in virtual and augmented reality audio.

Olivier Warusfel (IRCAM- FR)
 Title:Rendering methods and evaluation methodolody for audio augmented reality
Summary: AAR aims to seamlessly merge virtual sound events into the listener’s real environment. To this end, various audio rendering models can be used to spatialise virtual sound events in real time and apply reverberation effects that match the acoustic properties of the real environment. The quality of experience of the listeners will strongly depend on the coherence between the acoustical cues conveyed by the real sound sources and the perception of the spatial processing applied to the virtual events. On the basis of an experiment simulating an AAR use case, the presentation compares the respective merits and constraints of two rendering approaches and analyses the objective and perceptual factors that contribute to the overall quality of the experience, particularly from the point of view of ‘plausibility’.

Cette journée bénéficie du soutien de l'ANR et du Ministère de la Culture

            

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour nous permettre de mesurer l'audience, et pour vous permettre de partager du contenu via les boutons de partage de réseaux sociaux. En savoir plus.