مرکز منطقه ای اطلاع رساني علوم و فناوري - Real-time speaker localization and speech separation by audio-visual integration

DocumentCode :

3570431

Title :

Real-time speaker localization and speech separation by audio-visual integration

Author :

Nakadai, Kazuhiro ; Hidai, Ken-ichi ; Okuno, Hiroshi G. ; Kitano, Hiroaki

Author_Institution :

ERATO, Japan Sci. & Tech. Corp., Tokyo, Japan

Volume :

fYear :

2002

fDate :

6/24/1905 12:00:00 AM

Firstpage :

1043

Abstract :

Robot audition in real-world should cope with motor and other noises caused by the robot´s own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. The system runs in real-time by using distributed processing on 4 PCs connected by a Gigabit Ethernet. The system implemented in a upper-torso humanoid tracks multiple talkers and extracts speech from a mixture of sounds. The performance of epipolar geometry based sound source localization and sound source separation by active and adaptive direction-pass filtering is also reported.

Keywords :

distributed processing; filtering theory; mobile robots; position control; real-time systems; robot vision; sensor fusion; speech recognition; adaptive filtering; audio-visual integration; direction-pass filtering; distributed processing; epipolar geometry; humanoid robot; multiple speaker tracking; real time system; reverberation; robot audition; sound source localization; sound source separation; speech separation; Acoustic noise; Distributed processing; Ethernet networks; Personal communication networks; Real time systems; Reverberation; Robots; Speech; Streaming media; Working environment noise;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Robotics and Automation, 2002. Proceedings. ICRA '02. IEEE International Conference on

Print_ISBN :

0-7803-7272-7

Type :

conf

DOI :

10.1109/ROBOT.2002.1013493

Filename :

1013493

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3570431