This project focuses on a common issue for two apparently distinct research domains:
- The automatic analysis of a sound scene, from a learning process using a dictionary. In audio signal processing, we find this aspect in several major applications for example computational auditory scene analysis (CASA), automatic indexing, source separation, detection and localization of sound objects.
- Artificial hearing, a recent field of study in robotics, for which the analysis of a sound stage gradually becomes a prerequisite for any modern application (e.g. monitoring the elderly, or studying human-robot interaction).
The central objective of the project is the design and development of a new method for the detection and localization of the main speaker in a sound scene. The method is intended to enable a robot to identify a vocal signal in the presence of noise and locate the main speaker’s position, in the case where there are several speakers. The problem is closely related to the current field of computational auditory scene analysis (CASA). The objective of CASA is to design automatic systems with a perception that mimics human hearing, considering its physical and psycho-acoustical aspects. This project takes a different approach, while the audio processing tools are comparable (machine learning, source separation), hearing is treated from the robot’s point of view and the interest lies in the analysis of the audio scene.