Today this seminar is welcoming Sascha HORNAUER, researcher at the MINES Paristech, for a discussion around "3D-Scene Reconstruction based on Audio-Visual Data".
This seminar took place at Ircam. If you've missed it or want to watch it again,
click on https://medias.ircam.fr/xdf4ddf_3d-scene-reconstruction-based-on-audio-vis
I am a researcher in computer vision and robotics at the CAOR, MINES Paristech. I graduated at the University of Oldenburg, Germany in trajectory negotiation for autonomous ships. After I did a postdoc in
Computer Vision at the UC Berkeley under Stella Yu. There I became active in research to include sound to solve typical vision tasks better. My efforts now are using sound for robotic navigation when
visual sensor fail. Still in Berkeley I added a binaural microphone and RGB-Depth sensor to a robot to collect an audio-visual dataset. I then predicted rough depth information from stereo sound using the echolocation principle.
Towards the goal of developing a robust sound sensor, I realized the value of having geometrically correct room impulse responses, which do not just sound plausible to a human being but contain accurate space
information. With those RIRs I could quickly prototype polling sounds, such as the frequency sweeps bats use to visualize rooms. Generating plausible RIRs is a novel research direction in the computer vision domain which recently aims to improve e.g. virtual backgrounds in video chat software. Using images and videos to generate visually grounded RIRs which are also geometrically plausible, and which ideally can be
generated for each individual position within a single room, are I think within reach of current methods and I would like to discuss how to potentially collaborate on this or a similar topic, also to improve the experience in augmented reality use cases.
Christensen, Jesper Haahr, Sascha Hornauer, and X. Yu Stella. "Batvision: Learning to see 3d spatial layout with two ears." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
Christensen, Jesper Haahr, Sascha Hornauer, and Stella Yu. "BatVision with GCC-PHAT Features for Better Sound to Vision Predictions." Sight & Sound 2020 (2020).
Hornauer, Sascha, et al. "Unsupervised Discriminative Learning of Sounds for Audio Event Classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.