DocumentCode :
3570431
Title :
Real-time speaker localization and speech separation by audio-visual integration
Author :
Nakadai, Kazuhiro ; Hidai, Ken-ichi ; Okuno, Hiroshi G. ; Kitano, Hiroaki
Author_Institution :
ERATO, Japan Sci. & Tech. Corp., Tokyo, Japan
Volume :
1
fYear :
2002
fDate :
6/24/1905 12:00:00 AM
Firstpage :
1043
Abstract :
Robot audition in real-world should cope with motor and other noises caused by the robot´s own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. The system runs in real-time by using distributed processing on 4 PCs connected by a Gigabit Ethernet. The system implemented in a upper-torso humanoid tracks multiple talkers and extracts speech from a mixture of sounds. The performance of epipolar geometry based sound source localization and sound source separation by active and adaptive direction-pass filtering is also reported.
Keywords :
distributed processing; filtering theory; mobile robots; position control; real-time systems; robot vision; sensor fusion; speech recognition; adaptive filtering; audio-visual integration; direction-pass filtering; distributed processing; epipolar geometry; humanoid robot; multiple speaker tracking; real time system; reverberation; robot audition; sound source localization; sound source separation; speech separation; Acoustic noise; Distributed processing; Ethernet networks; Personal communication networks; Real time systems; Reverberation; Robots; Speech; Streaming media; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Robotics and Automation, 2002. Proceedings. ICRA '02. IEEE International Conference on
Print_ISBN :
0-7803-7272-7
Type :
conf
DOI :
10.1109/ROBOT.2002.1013493
Filename :
1013493
Link To Document :
بازگشت