• Research
  • Seminars

Today this seminar is welcoming Sascha HORNAUER, researcher at the MINES Paristech, for a discussion around "3D-Scene Reconstruction based on Audio-Visual Data".

This seminar will be at Ircam, but to allow freely this discussion to those who can't be there, we propose a ZOOM link:


I am a researcher in computer vision and robotics at the CAOR, MINES Paristech. I graduated at the University of Oldenburg, Germany in trajectory negotiation for autonomous ships. After I did a postdoc in 
Computer Vision at the UC Berkeley under Stella Yu. There I became active in research to include sound to solve typical vision tasks better. My efforts now are using sound for robotic navigation when 
visual sensor fail. Still in Berkeley I added a binaural microphone and RGB-Depth sensor to a robot to collect an audio-visual dataset. I then predicted rough depth information from stereo sound using the echolocation principle. 

Towards the goal of developing a robust sound sensor, I realized the value of having geometrically correct room impulse responses, which do not  just sound plausible to a human being but contain accurate space 
information. With those RIRs I could quickly prototype polling sounds, such as the frequency sweeps bats use to visualize rooms. Generating plausible RIRs is a novel research direction in the computer vision domain which recently aims to improve e.g. virtual backgrounds in video chat software. Using images and videos to generate visually grounded RIRs which are also geometrically plausible, and which ideally can be 
generated for each individual position within a single room, are I think within reach of current methods and I would like to discuss how to potentially collaborate on this or a similar topic, also to improve the experience in augmented reality use cases.

related citations:

Christensen, Jesper Haahr, Sascha Hornauer, and X. Yu Stella. "Batvision: Learning to see 3d spatial layout with two ears." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
Christensen, Jesper Haahr, Sascha Hornauer, and Stella Yu. "BatVision with GCC-PHAT Features for Better Sound to Vision Predictions." Sight & Sound 2020 (2020).
Hornauer, Sascha, et al. "Unsupervised Discriminative Learning of Sounds for Audio Event Classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour nous permettre de mesurer l'audience, et pour vous permettre de partager du contenu via les boutons de partage de réseaux sociaux. En savoir plus.