DocumentCode :
3285846
Title :
Robust localization and tracking of multiple speakers in real environments for binaural robot audition
Author :
Ui-Hyun Kim ; Okuno, Hiroshi G.
Author_Institution :
Dept. of Intell. Sci. & Technol., Kyoto Univ., Kyoto, Japan
fYear :
2013
fDate :
3-5 July 2013
Firstpage :
1
Lastpage :
4
Abstract :
This paper presents a multisource sound localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) and a novel multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering algorithm for binaural robot audition. The standard K-means clustering algorithm was improved for the purpose of multisource speech tracking by adding two additional steps. Experiments conducted on the SIG-2 humanoid robot in a real environment show that our method can track multiple speakers in real-time with tracking error below 4.35°.
Keywords :
correlation methods; humanoid robots; learning (artificial intelligence); mobile robots; pattern clustering; speech processing; transforms; GCC method; PHAT; SIG-2 humanoid robot; VAD; binaural robot audition; generalized cross-correlation method; multiple speaker tracking; multisource sound localization method; multisource speech tracking method; phase transform; standard K-means clustering algorithm; voice activity detection; Clustering algorithms; Direction-of-arrival estimation; Estimation; Microphones; Robots; Speech; Standards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Analysis for Multimedia Interactive Services (WIAMIS), 2013 14th International Workshop on
Conference_Location :
Paris
ISSN :
2158-5873
Type :
conf
DOI :
10.1109/WIAMIS.2013.6616137
Filename :
6616137
Link To Document :
بازگشت